97365ffd-3cc8-44df-af8a-e5bd49f6bd68

OpenAI and Broadcom Just Designed a Chip That Makes AI Actually Profitable

The new inference chip aims to slash costs and energy use.

Alex Novak||Source: Ars Technica
OpenAI and Broadcom Just Designed a Chip That Makes AI Actually Profitable
Photo by Andrew Neel on Pexels

The silicon race is heating up amid the struggle to keep up with demand. But OpenAI and Broadcom aren't just trying to build a faster chip—they're trying to build a cheaper one. And that might matter more.

On Wednesday, the two companies announced a new application-specific integrated circuit (ASIC) designed specifically for large language model inference. Translation: a chip built to run AI models once they're trained, not to train them. It's a bet that the real money in AI isn't in building the models—it's in running them at scale.

The chip, which doesn't yet have a flashy name, is engineered to handle the unique computational demands of transformer-based models like GPT-4 and its successors. Unlike training chips, which need brute-force matrix multiplication and massive memory bandwidth, inference chips need to be fast, efficient, and cheap. OpenAI and Broadcom claim their design delivers a 40% improvement in performance-per-dollar over existing inference hardware from Nvidia.

That's a direct shot at Nvidia's dominance. For the past two years, Nvidia's H100 and B200 GPUs have been the de facto standard for both training and inference. But as AI deployment explodes—every Fortune 500 company wants its own chatbot, every startup wants to embed language models—the cost of inference has become a bottleneck. A single query to GPT-4 costs OpenAI roughly $0.01 in compute. Multiply that by a billion queries a day, and you're talking $10 million daily. The margins vanish fast.

OpenAI's CEO Sam Altman has been vocal about the need to reduce inference costs. "The biggest barrier to AGI isn't capability—it's economics," he told reporters in a briefing last week. "We need to make intelligence cheap enough that it's ubiquitous."

"The biggest barrier to AGI isn't capability—it's economics. We need to make intelligence cheap enough that it's ubiquitous." — Sam Altman

The Broadcom partnership is a logical next step. Broadcom has a long history of designing custom ASICs for networking and telecom—think the chips inside your Wi-Fi router or cable modem. They're masters of high-volume, low-cost silicon. OpenAI brings the architectural knowledge of what LLMs actually need at the transistor level. Together, they're aiming for a chip that can be mass-produced at a fraction of the cost of a GPU.

Why Inference Matters More Than Training

Here's a number that should make you sit up: by some estimates, 90% of AI compute spending will be on inference by 2028. Training a model like GPT-4 costs hundreds of millions of dollars—once. Running it costs millions every day. Forever. The industry is shifting from a model-building phase to a model-deploying phase, and the winners will be those who can serve the cheapest tokens.

Nvidia's GPUs are overkill for inference. They're designed for parallel matrix operations across thousands of cores, which is great for training but wasteful when you're just generating a single response. An inference-specialized chip can strip away the unnecessary compute units, add specialized low-precision arithmetic, and optimize the memory hierarchy for the linear, sequential nature of token generation.

OpenAI's design reportedly includes a novel memory subsystem that reduces the time spent fetching weights from DRAM—a major bottleneck in inference. Instead of loading the entire model into memory for every query, the chip caches frequently accessed parameters on-die, using a predictive algorithm to anticipate which parts of the model will be needed next. Think of it like a browser cache for neural networks.

Broadcom's manufacturing expertise means the chip can be built using a mature 5nm process, keeping yields high and costs low. Early samples are expected in early 2027, with volume production later that year. OpenAI plans to use the chip in its own data centers first, then eventually sell access to the hardware through Azure and other cloud providers.

The Bigger Picture: A Chip Arms Race

OpenAI and Broadcom aren't the only ones chasing the inference prize. Google has its TPU v5, Amazon has Trainium and Inferentia, Microsoft is rumored to be designing its own AI chip with Intel, and a host of startups like Groq and Cerebras are pushing novel architectures. But OpenAI's partnership is unique because of the sheer scale of its deployment. The company is running one of the largest inference workloads on the planet—ChatGPT alone handles over 100 million queries per day. If they can cut costs by 40%, they could either pocket the savings or drop prices to crush competitors.

That's the real threat. OpenAI's API pricing has already fallen 95% since GPT-3 launched in 2020. A custom chip accelerates that trend. Competitors like Anthropic and Google's DeepMind will feel the pressure to either develop their own silicon or strike similar deals. Nvidia, for its part, isn't sitting still—they're rumored to be working on an inference-specific variant of their next-gen Blackwell architecture.

But there's a catch: designing a custom chip is expensive and slow. The NRE (non-recurring engineering) cost for a 5nm ASIC can run $50 million or more, and the timeline from design to tape-out to production is 18-24 months. By the time OpenAI's chip hits the market, Nvidia could have another generation of GPUs out. The bet is that the cost savings will be sustainable over multiple generations.

What This Means for the Rest of Us

If you're an AI user, cheaper inference means more capable free tiers, faster responses, and new applications that were previously too expensive to run. Think real-time video analysis, voice assistants that actually understand context, and AI tutors that don't cost a subscription fee. For businesses, it means bots that don't eat into margins.

But it also means more centralization. OpenAI will control the entire stack—model, hardware, and cloud—locking customers into its ecosystem. The vision of AI as a commodity utility, like electricity, becomes harder to realize if one company owns the power plants and the grid. Antitrust regulators are already eyeing the AI market with suspicion. This chip deal won't cool those concerns.

Then there's the environmental angle: inference at scale is energy-intensive. A 40% efficiency gain is significant, but the rebound effect could erase those gains if lower costs spur much higher usage. The Jevons paradox applies to AI too: cheaper compute means more compute, not less.

OpenAI and Broadcom are making a clear statement: the future of AI is custom silicon. GPUs were a wonderful hack, but they were never designed for this. The era of general-purpose computing for AI is ending. Specialization is here, and it's going to reshape the economics of intelligence.

The press release is full of boilerplate about "unlocking potential" and "democratizing AI." Ignore that. What this really is: a power play. OpenAI is building a moat with silicon, and Broadcom is happy to sell them the shovels.

Advertisement
#openai#broadcom#ai inference chip#llm inference#custom silicon
分享到:XfWB