The GPU capacity market has crossed a threshold: supply is no longer something you procure on demand, it is something you reserve years in advance or compete for the scraps.

What Happened

Three signals from this week crystallize the shift. First, hyperscaler earnings across Amazon, Google, Meta, and Microsoft confirm that growth is now gated by power, chips, and capex, not software. The constraint is physical. Second, Microsoft's Azure backlog has hit $627 billion, with internal bottlenecks in power and cooling identified as the primary brake on delivery. Clients booking Azure H200 or B200 capacity today are not getting it this quarter. Third, and most structurally significant: gigawatt-scale compute deals are pre-selling capacity at the platform level. The Google-Anthropic 5GW commitment is not an outlier. It is the new template for how frontier AI labs lock in training infrastructure. Spot-market GPU procurement, the model most enterprises and scaleups still rely on, is becoming a residual market for whatever capacity the big commitments did not absorb.

Layered on top of this: AMD just signed a second 25MW colocation lease at Riot's Texas campus, its second capacity grab at the same site. When chip vendors are pre-securing raw compute real estate ahead of customer commitments, that is a leading indicator that the colocation pipeline is tightening too. Power-ready, network-dense, Tier III (the data center reliability tier, targeting 99.982% uptime) space in markets like Northern Virginia, Dallas, and Phoenix is increasingly spoken for before the first GPU is racked.

Why It Matters

The mechanism here is straightforward but its implications are underappreciated. Hyperscalers (the largest cloud providers, specifically AWS, Azure, GCP, and Oracle) are capital-allocating at a pace that their power interconnection queues and chip supply chains cannot match. The result is extended lead times, not just for on-demand instances but for reserved capacity. Wait lists for H200 and B200 clusters on major hyperscaler platforms now stretch multiple quarters. For a Fortune 500 enterprise rolling out AI infrastructure for the first time, that timeline is operationally unacceptable when a board-level AI initiative has a fiscal-year deadline.

This is precisely where neocloud operators (specialized GPU cloud providers, an alternative to hyperscalers) have carved out durable advantage. Because they operate leaner balance sheets and purpose-built GPU infrastructure, the neocloud operators we work with are consistently delivering reserved H100, H200, and B200 capacity in weeks, not quarters. Pricing on reserved instances frequently runs 30-50% below comparable hyperscaler rates, with more flexible contract structures. Sovereign AI programs in the US and EU, which cannot absorb multi-quarter delays while national AI strategies are politically live, are increasingly evaluating neocloud alternatives as primary compute venues rather than fallback options.

The Pentagon's multi-vendor AI contract awards to Nvidia, Google, Microsoft, Amazon, and others reinforce a related point: sophisticated infrastructure clients are deliberately diversifying across vendors. A single-vendor compute stack is now viewed as a strategic liability, not a simplification. The smart portfolio runs hyperscalers for certain regulated or latency-sensitive workloads, neocloud operators for cost-efficient reserved training and inference capacity, and owned or leased colocation for clients who want dedicated hardware without cloud pricing.

What Clients Should Do

If you are a frontier AI lab planning a 10,000-GPU training cluster, the Google-Anthropic deal is your competitive benchmark. You are not locking in 5GW, but the principle applies: capacity reserved 12-18 months forward comes with meaningfully better pricing and delivery certainty than capacity sourced six weeks before you need it. Neocloud operators with existing H200 and GB200 inventory can structure forward commitments. Start those conversations now.

If you are a Fortune 500 enterprise in financial services, pharma, or manufacturing entering AI infrastructure for the first time, do not assume the hyperscaler portal will deliver what you need on your timeline. Azure and GCP are excellent for some workloads. For dedicated GPU clusters, the neocloud market is faster, cheaper, and more negotiable. Run a parallel evaluation. The delta in total cost of ownership over a 12-month reserved term is often seven figures.

If you are a scaleup ramping inference capacity, the Anthropic-Fractile signal on SRAM-based inference chips (designed without DRAM, reducing memory cost and latency at scale) is worth tracking, but the immediate procurement decision is about H100 and H200 reserved capacity today. Locking in inference cluster capacity at current neocloud rates before the next wave of demand hits is the right trade. On the colocation side: if your roadmap includes bringing hardware in-house, power-ready Tier III space from operators like Equinix, Digital Realty, CyrusOne, or QTS in Dallas, Phoenix, or Northern Virginia is also moving. AMD leasing 25MW speculatively is a signal, not noise.

How XIRR Advisors Can Help

XIRR Advisors brokers reserved GPU capacity from neocloud operators and Tier III colocation space across the US. We do not broker AWS, Azure, or GCP directly since those platforms sell direct. Where we add value is in the neocloud and colocation markets, where pricing, availability, and contract terms vary significantly and are not publicly listed.

Share your requirements, whether that is GPU type, cluster size, region, and timing, or megawatts of colocation, and we will canvas the market on your behalf and return a shortlist within 48 hours. Earlier conversations consistently yield better terms. Our fee is paid by the provider. Clients pay nothing. Reach us at contact@xirradvisors.com or DM @XIRRAdvisors.

References

[1] Data Center Knowledge: Hyperscaler Earnings Show AI Demand Outrunning Infrastructure

[2] Data Center Knowledge: AI Capacity Is Being Pre-Sold at Gigawatt Scale

[3] Data Center Knowledge: Microsoft AI Surge Exposes Data Center Capacity Gap

[4] Data Center Dynamics: AMD Signs Additional 25MW Data Center Lease with Riot in Texas

[5] Tom's Hardware Pro: Pentagon Announces AI Deals with OpenAI, Google, Microsoft, Amazon, Nvidia and More

[6] Tom's Hardware Pro: Anthropic in Early Talks to Buy Inference Chips from UK Startup Fractile

GPU MarketsNeocloudHyperscalerEnterprise AIReserved Compute
— Tell Us What You're Sourcing

Share your requirements. We'll canvas the market.

Tell us your needs (region, GPU type, capacity, timing — or MW for colocation) and we'll canvas the neocloud and colocation markets on your behalf. Shortlist in 48 hours.

Earlier conversations get better terms. When you engage early, we have time to negotiate with vendors before you need to commit. You pay nothing. Provider-paid model.