OpenAI has placed a massive $10 billion bet against the GPU status quo. In a definitive move to secure its supply chain, the company finalized a deal with chipmaker Cerebras Systems to deploy 750 megawatts (MW) of compute capacity. This isn’t for training new models; it is dedicated exclusively to inference—the actual running of AI applications.
The partnership represents one of the largest non-GPU hardware commitments in history. Starting in late 2026 and rolling out through 2028, OpenAI will bring these wafer-scale systems online to handle the crushing demand for ChatGPT and its agentic successors. By diversifying away from an all-Nvidia fleet, OpenAI secures a critical lifeline for low-latency compute while handing Cerebras the commercial anchor it desperately needed before its public offering.
The Pivot: From Training to “Doing”

For years, the industry obsessed over training—feeding petabytes of text into the beast to teach it patterns. That era is peaking. As GPT-5 enters production, the real bottleneck is no longer teaching the model; it’s getting the model to answer.
Standard GPU clusters are great at parallel processing, but they can be sluggish at sequential tasks like real-time conversation or complex code generation. Data has to hop between thousands of separate chips, creating lag.
Cerebras takes a different approach. They don’t just build a bigger chip; they effectively print the entire cluster onto a single wafer. Their flagship CS-3 system uses the Wafer-Scale Engine (WSE-3), a processor the size of a dinner plate. It integrates memory and compute on a single slab of silicon. The result? Data barely has to travel. This architecture can hit inference speeds up to 15 times faster than conventional setups. For OpenAI, this means faster responses and the ability to run complex “reasoning” models without the awkward pause.
The Deal by the Numbers
OpenAI isn’t turning this all on at once. It’s a phased rollout designed to mitigate risk while scaling up capacity.
| Metric | Deal Specifications |
|---|---|
| Total Power Capacity | 750 Megawatts (MW) |
| Estimated Value | > $10 Billion USD |
| Hardware Provider | Cerebras Systems CS-3 / WSE-3 |
| Primary Workload | Low-latency AI Inference |
| Rollout Start | Q3 2026 |
| Completion Target | 2028 |
| Key Objective | Reduce dependency on Nvidia accelerate real-time agentic AI |
To put 750MW in perspective: a standard hyperscale data center burns about 30MW to 50MW. OpenAI effectively just commissioned 15 to 20 massive facilities dedicated entirely to inference.
Buying Insurance Against a GPU Drought
Sam Altman is hedging his bets. The “resilient portfolio” strategy is a direct response to the supply chain fragility that choked the market in 2023. Reliance on a single vendor—even one as capable as Nvidia—is a vulnerability OpenAI can no longer afford.
Nvidia is still the king of training (see the recent 10GW partnership), but OpenAI is shopping around for everything else. The current hardware roster looks like this:
- AMD: A 6GW agreement for MI450 GPUs.
- Broadcom: ongoing collaboration for custom silicon.
- Cerebras: The new primary partner for speed.
Sachin Katti, OpenAI’s infrastructure lead, put it simply: match the system to the workload. Don’t waste a precious Nvidia H100 on a task a Cerebras wafer can do faster and cheaper.
Cerebras: From Niche to Mainstream
Cerebras needed this. Badly. Before this agreement, the Sunnyvale company had a dangerous problem: nearly 87% of its revenue was tied to a single client, G42. Investors get nervous when they see that kind of concentration.
This deal fixes the narrative. It proves the Wafer-Scale Engine isn’t just an impressive science project—it’s ready for critical commercial workflows. With a $10 billion backlog locked in through 2028, Cerebras transforms from a “promising startup” to a verified third player in the hardware race, sitting distinctly apart from the Nvidia/AMD duopoly.
The Power Problem

You can buy the chips, but can you plug them in? The primary constraint for AI in 2026 isn’t silicon; it’s electricity.
Deploying 750MW is a logistical nightmare. These systems run hot. The CS-3 is incredibly dense, meaning standard air cooling won’t cut it. It requires complex liquid cooling networks and massive power feeds. Success here depends on more than just hardware delivery; it depends on grid updates and whether partners like Constellation can actually restart nuclear assets like Three Mile Island to provide the necessary baseload power.
What Comes Next
We are leaving the “training at all costs” era. Now, it’s about efficiency. When AI agents start doing real work—booking flights, writing entire software modules, handling customer service—latency isn’t just annoying; it breaks the product.
OpenAI is paying a premium to ensure their models don’t lag. By locking in massive capacity for high-speed inference now, they are trying to outrun competitors stuck on standard GPU clouds. If Cerebras delivers, it could force a total rethink of data center architecture, moving the industry away from general-purpose chips and toward specialized processors for the daily grind of AI.