OpenAI and Broadcom unveil 'Jalapeño,' a custom chip built only for LLM inference
OpenAI revealed its first custom silicon, Jalapeño — a Broadcom-built ASIC designed to do one thing, run large language model inference, with engineering samples already in the lab and gigawatt-scale deployment targeted for late 2026.
OpenAI and Broadcom have unveiled Jalapeño, OpenAI's first custom-designed AI chip, an accelerator built to do exactly one thing: run large language model inference as efficiently as possible. The companies introduced the chip on June 24, calling it the product of a deep hardware-software co-design effort rather than a general-purpose part.
Built narrow on purpose
Unlike NVIDIA's H100 or GB200 — flexible processors meant to handle both training and inference across many workloads — Jalapeño is deliberately specialized for LLM inference. OpenAI argues that by giving up generality it can win on the metric that matters most at deployment scale: performance per watt. The company says engineering samples are already running machine-learning workloads in the lab at target frequency and power, including its own GPT-5.3-Codex-Spark model, and that early testing points to efficiency "substantially better" than current state-of-the-art accelerators.
The chip was co-developed from initial design to manufacturing tape-out in roughly nine months, which OpenAI characterizes as one of the fastest cycles ever for a high-performance ASIC. Part of that speed, the company notes, came from using its own models to accelerate portions of the design and optimization work.
A full-stack play
Jalapeño is the clearest signal yet that OpenAI intends to own its infrastructure end to end — not just frontier models and consumer products, but the chip architecture, kernels, memory, networking and scheduling underneath them. The strategic logic is straightforward: inference, not training, is where compute costs compound as usage grows, and controlling the silicon is the most direct lever on the cost of serving models like ChatGPT.
The companies are targeting initial deployments by the end of 2026, with gigawatt-scale rollouts planned alongside Microsoft and other partners. Broadcom, which handled silicon implementation, frames the program as a multi-year effort that ramps through the back half of the decade.
For OpenAI, the move also reduces dependence on a supply chain dominated by NVIDIA at a moment when access to accelerators is a competitive bottleneck across the industry. Whether Jalapeño's narrow design delivers the efficiency gains OpenAI is promising will not be clear until the chips run production traffic — but the bet reflects how central inference economics have become to the business of frontier AI.
Sources
AI-assisted reporting, overseen by the AgentsAI team. Spotted an error? Let us know.
Related agents
More ai news
US partially lifts Anthropic's Mythos 5 export ban, Fable 5 still blocked
The Trump administration reversed course on June 26, allowing Anthropic's Claude Mythos 5 to reach more than 100 vetted US companies and agencies after a two-week export ban shut the model down globally — but the more widely used Fable 5 remains off-limits.
MIT and Microsoft's 'Murakkab' cuts the cost and energy of running AI agents
Researchers from MIT and Microsoft Azure detailed Murakkab, a system that lets developers describe agentic workflows in plain language and then automatically picks the models, tools, and hardware to run them — using roughly a third of the compute and a quarter of the energy and cost of conventional approaches in tests.
OpenAI agrees to stagger GPT-5.6 release as US government screens early access
OpenAI will limit the initial release of GPT-5.6 to a small set of government-approved partners after a request from the Trump administration, with federal officials approving customers one by one during the preview — a notable shift in how frontier models reach the public.
Transformer co-inventor Noam Shazeer leaves Google for OpenAI
Noam Shazeer, a co-author of the 2017 'Attention Is All You Need' paper and co-lead of Google's Gemini models, announced he is joining OpenAI as Lead for Architecture Research — a departure that cost Google $2.7 billion to prevent just two years ago.