NVIDIA Blackwell Ultra Cuts AI Inference Costs Up to...

New performance data shows that NVIDIA’s next-generation Blackwell Ultra platform delivers up to 50× higher throughput and up to 35× lower cost per token for agentic AI workloads compared with the previous Hopper generation — enabling significantly faster, more economically scalable real-time coding assistants and autonomous AI agents.

NVIDIA’s latest Blackwell Ultra platform is reshaping the economics of agentic and interactive AI applications with dramatic performance and cost improvements. According to new SemiAnalysis InferenceX data, systems built on the GB300 NVL72 — powered by Blackwell Ultra GPUs — now achieve up to 50× better throughput per megawatt and 35× lower cost per million tokens than NVIDIA’s earlier Hopper architecture.

This leap in efficiency is especially impactful for low-latency, long-context AI workloads, such as multistep coding assistants and agentic reasoning systems, where both responsiveness and token-cost efficiency are critical. Continuous optimizations in NVIDIA’s software stack — including TensorRT-LLM, Dynamo, Mooncake and SGLang — further boost throughput across a range of workloads.

In real-world deployments, cloud partners like Microsoft, CoreWeave and Oracle Cloud Infrastructure are already scaling GB300 NVL72 systems to support agentic coding and long-context reasoning services, aiming to serve more users at lower operational cost.

Looking ahead, NVIDIA’s upcoming Rubin platform is expected to push performance and cost efficiency even further, potentially delivering another round of significant gains for future AI reasoning workloads.

source https://blogs.nvidia.com/blog/data-blackwell-ultra-performance-lower-cost-agentic-ai/

NVIDIA Blackwell Ultra Cuts AI Inference Costs Up to 35×, Boosts Agentic AI Performance by 50×

Comments