GPU pricing is not transparent, not stable, and not equal across providers. What you pay depends almost entirely on when you ask, who you ask, and whether you understand the market well enough to push back.
This guide cuts through the noise. It covers current reserved GPU pricing across hyperscalers (the largest cloud providers: AWS, Azure, GCP, Oracle) and neoclouds (specialized GPU cloud providers like CoreWeave, Lambda Labs, Crusoe, and Nebius), explains why pricing is moving fast right now, and gives concrete guidance for different client types.
The Market Right Now: Supply Is Still Tight, But Fracturing
NVIDIA remains the dominant supplier. Its latest earnings filings confirm demand continues to outrun supply at the frontier, particularly for H200, B200, and the newer GB200/GB300 configurations. TSMC, which manufactures the silicon inside these chips, now derives nearly a third of total revenue from AI-related products, per The Next Platform. That concentration means any fab capacity hiccup ripples directly into GPU availability and spot pricing.
At the same time, agentic AI workloads are reshaping what clients actually need. Per Data Center Knowledge, long-horizon agent tasks are breaking existing throughput planning models. These workloads run longer, require more persistent memory, and stress clusters in ways that standard batch-training benchmarks never anticipated. The result: capacity you sized for fine-tuning may be underbuilt for inference at scale.
The Pricing Stack: What Reserved GPU Capacity Actually Costs
Prices below are approximate 2026 market rates for reserved capacity (1-to-3-year commitments). Spot and on-demand rates are 30 to 80 percent higher and unreliable for production workloads.
H100 80GB SXM5 Hyperscalers (AWS p5, Azure NDv5, GCP A3): $2.80 to $3.50 per GPU-hour on 1-year reserved terms. Availability on new reservations: 3 to 6 month wait in most regions. Neoclouds (CoreWeave, Lambda Labs, Crusoe, TensorWave): $1.85 to $2.50 per GPU-hour on equivalent reserved terms. Availability: 2 to 6 weeks. That is a 25 to 40 percent discount on like-for-like committed capacity.
H200 141GB SXM Hyperscalers: $4.20 to $5.50 per GPU-hour. Wait lists are real. Azure and GCP are heavily backlogged in US East regions. Neoclouds (Nebius, Nscale, CoreWeave, Voltage Park): $2.90 to $3.80 per GPU-hour. Meaningful capacity available now in EU (Frankfurt, Amsterdam) and US (Dallas, Chicago).
B200 / GB200 NVL72 Hyperscalers: $7.00 to $10.00 per GPU-hour (early access, limited quoting). Most clients are on waitlists for Q3/Q4 2026 delivery. Neoclouds: $5.50 to $7.50 per GPU-hour for the few providers with live inventory. CoreWeave and Lambda are the furthest ahead here. Expect aggressive pre-commitment discounts if you sign before capacity is live.
GB300 remains largely pre-revenue. Pricing conversations are happening at the cluster level, not per-GPU-hour. If you are a frontier lab or a sovereign AI program (government or quasi-government AI initiative), now is exactly when to have those conversations.
Why Hyperscalers Cost More and Deliver Later
Hyperscalers bundle networking, storage, managed services, and compliance tooling into their GPU SKUs. That bundle has real value for a Fortune 500 enterprise standing up its first AI infrastructure stack. You get a single vendor, a single MSA (Master Service Agreement, the parent contract governing the relationship), and SLAs (Service Level Agreements defining uptime guarantees) that legal teams recognize.
But frontier labs and AI scaleups rarely need that bundle. They have their own networking teams, their own storage infrastructure, their own compliance workflows. They are paying a 30 to 50 percent premium for services they do not use.
Neoclouds exist to serve that gap. They offer bare-metal GPU access with faster ramp times (the deployment timeline for capacity coming online), shorter minimum commitments, and prices benchmarked against raw hardware costs rather than enterprise service margins.
What Clients Should Do
If you are a Fortune 500 enterprise running your first serious AI workload, start with a hyperscaler for development and small-scale inference. But before you commit to a large reserved block, get a competing quote from CoreWeave, Lambda, or Crusoe. The difference on a 256-GPU reserved cluster over 12 months can exceed $2 million. Use the competing quote as leverage or take it.
If you are a frontier lab planning a 10,000-GPU training cluster, hyperscaler waitlists are not an option. You need to be negotiating directly with neocloud providers who have B200 or GB200 inventory, or colocation operators where you can bring your own hardware. Power is now a real constraint: per Data Center Dynamics, operators are signing 1GW-plus behind-the-meter (on-site power generation bypassing the public grid) gas supply deals in Ohio and Texas because grid interconnection queues stretch years. Colocation sites with existing power access command significant premiums. Lock one early.
If you are a sovereign AI program or a system integrator sourcing for a government client, EU-based capacity is increasingly the conversation. Nebius, Nscale, and Genesis Cloud all have meaningful EU footprints. Google's TPU 8 roadmap, per The Next Platform, signals that non-NVIDIA accelerators are maturing, which opens optionality, but for reserved capacity today, NVIDIA GPU-based clusters remain the default.
If you are a scaleup ramping inference, the smartest structure is a portfolio: a small hyperscaler footprint for burst and compliance cover, a 1 to 2 neocloud reserved block as your cost-efficient base, and a colocation relationship for hardware you own. Clients running this architecture consistently beat single-vendor pricing by 35 to 50 percent.
The single most expensive mistake is waiting until you need capacity to start the conversation. Providers discount aggressively when they have time to compete. When you are 30 days from needing GPUs live, your leverage disappears.
How XIRR Can Help
XIRR Advisors is a custom-sourcing broker for reserved GPU capacity from neoclouds (CoreWeave, Lambda Labs, Crusoe, Nebius, Nscale, and others) and Tier III colocation space (99.982% uptime standard) across the USA. We represent the client. Providers pay our fee. You pay nothing.
Share your requirements, including region, GPU type, cluster size, and timing, or megawatts if you are evaluating colocation, and we will canvas the neocloud and colocation markets on your behalf and return a shortlist within 48 hours. Many clients need both GPU capacity and a colocation anchor. We handle both in one conversation. Earlier engagements get better terms because vendors have time to compete. Email contact@xirradvisors.com or DM @XIRRAdvisors.
Share your requirements. We'll canvas the market.
Tell us your needs (region, GPU type, capacity, timing — or MW for colocation) and we'll canvas the neocloud and colocation markets on your behalf. Shortlist in 48 hours.
Earlier conversations get better terms. When you engage early, we have time to negotiate with vendors before you need to commit. You pay nothing. Provider-paid model.