Agents AI

Launch
ai

OpenAI and Broadcom unveil 'Jalapeño,' a custom chip built only for LLM inference

OpenAI revealed its first custom silicon, Jalapeño — a Broadcom-built ASIC designed to do one thing, run large language model inference, with engineering samples already in the lab and gigawatt-scale deployment targeted for late 2026.

AgentsAI NewsroomJune 29, 20262 min read

OpenAI and Broadcom have unveiled Jalapeño, OpenAI's first custom-designed AI chip, an accelerator built to do exactly one thing: run large language model inference as efficiently as possible. The companies introduced the chip on June 24, calling it the product of a deep hardware-software co-design effort rather than a general-purpose part.

Built narrow on purpose

Unlike NVIDIA's H100 or GB200 — flexible processors meant to handle both training and inference across many workloads — Jalapeño is deliberately specialized for LLM inference. OpenAI argues that by giving up generality it can win on the metric that matters most at deployment scale: performance per watt. The company says engineering samples are already running machine-learning workloads in the lab at target frequency and power, including its own GPT-5.3-Codex-Spark model, and that early testing points to efficiency "substantially better" than current state-of-the-art accelerators.

The chip was co-developed from initial design to manufacturing tape-out in roughly nine months, which OpenAI characterizes as one of the fastest cycles ever for a high-performance ASIC. Part of that speed, the company notes, came from using its own models to accelerate portions of the design and optimization work.

A full-stack play

Jalapeño is the clearest signal yet that OpenAI intends to own its infrastructure end to end — not just frontier models and consumer products, but the chip architecture, kernels, memory, networking and scheduling underneath them. The strategic logic is straightforward: inference, not training, is where compute costs compound as usage grows, and controlling the silicon is the most direct lever on the cost of serving models like ChatGPT.

The companies are targeting initial deployments by the end of 2026, with gigawatt-scale rollouts planned alongside Microsoft and other partners. Broadcom, which handled silicon implementation, frames the program as a multi-year effort that ramps through the back half of the decade.

For OpenAI, the move also reduces dependence on a supply chain dominated by NVIDIA at a moment when access to accelerators is a competitive bottleneck across the industry. Whether Jalapeño's narrow design delivers the efficiency gains OpenAI is promising will not be clear until the chips run production traffic — but the bet reflects how central inference economics have become to the business of frontier AI.

AI-assisted reporting, overseen by the AgentsAI team. Spotted an error? Let us know.