At NVIDIA GTC 2024, NVIDIA unveiled the next-generation successor to the Hopper architecture GPUs, perfectly aligning with its two-year GPU update cycle. CEO Jensen Huang took the stage at the SAP Center for the first time in five years to showcase the company’s latest offerings, which have been flying off the shelves faster than TSMC can produce. Discover more about the new NVIDIA GB200.
American semiconductor company NVIDIA introduced its latest AI chips at the GPU Technology Conference (GTC) 2024, signaling its resolve to keep charging ahead in a race where it already is in the lead by miles, rather, trillions.
Taking the stage at NVIDIA GTC 2024 for the first time in five years, CEO Jensen Huang showed off the company’s latest wares, which have been selling faster than TSMC can fabricate.
The compute-hungry demands of AI training and inferencing have had some of the world’s biggest companies, including Microsoft, Alphabet, Amazon, and Meta, stocking up for respective service deployments.
In fact, the popularity of Blackwell chips’ predecessor, the Hopper architecture-based H100 and GH200, is so high that NVIDIA registered $22.1 billion in earnings in Q4 2024, up 22% quarter-over-quarter (QoQ) and 265% year-over-year (YoY). During the earnings call in February, Huang had to clarify that the company doesn’t play favorites when fulfilling orders. “We allocate fairly. We do the best we can to allocate fairly, and to avoid allocating unnecessarily,” Huang said.
Now consider this: the GB200 chip is 5x faster and more efficient, utilizing up to 25% less power, thanks to the use of copper cables instead of optical transceivers. It has 208 billion transistors compared to 80 billion in H100, delivers 20 petaflops compared to 4 petaflops by H100, and leverages the new fifth-generation NVLink that enables 1.8 TB/s bidirectional throughput per GPU and scalability up to 576 GPUs (compared to scalability to 256 GPUs with H100).
However, there’s a catch: NVIDIA is taking up a notch of what made the H100 successful. The company is introducing a 4-bit floating point (FP4) in GB200 for throughput gains.
Blackwell, the new beast in town.
— Jim Fan (@DrJimFan) March 18, 2024
> DGX Grace-Blackwell GB200: exceeding 1 Exaflop compute in a single rack.
> Put numbers in perspective: the first DGX that Jensen delivered to OpenAI was 0.17 Petaflops.
> GPT-4-1.8T parameters can finish training in 90 days on 2000 Blackwells.… pic.twitter.com/3sccGRsVmM
When NVIDIA introduced H100, it sacrificed the floating-point number from half-precision (FP16) to 8-bit floating point (FP8), which enabled it to extract a higher throughput from the H100 GPU. With FP4, GB200 offers 5x better performance than H100. However, GB200 is 2.5x faster than the H100 with FP8.
The GB200 is chiplet-esque with two Blackwell NVIDIA B200 Tensor Core GPUs to the NVIDIA Grace CPU. However, since the GB200 has two GPU dies, which leverage the TSMC 4NP node (the H100 uses the TSMC 4N node) and offer no spacial improvements, it can be bulkier than what AI devs are used to. Still, each die on the GB200 packs more transistors than the H100. NVIDIA could rely on the 3N node for its next GPU architecture.
Further, NVIDIA has packaged the GB200 in the multi-node, liquid-cooled 3,000-pound NVIDIA GB200 NVL72 rack system consisting of 600,000 parts, including 72 GB200s, and capable of delivering 720 petaflops for training and 1.4 exaflops for inferencing. The cluster provides a 30x performance improvement over the same number of H100 chips.
“The amount of energy we save, the amount of networking bandwidth we save, the amount of wasted time we save, will be tremendous,” Huang told an 11,000-strong audience at the SAP Center in San Francisco. “The way we compute is fundamentally different. We created a processor for the generative AI era.”
NVIDIA claimed that the GB200 NVL72 rack alone can power a 27-trillion-parameter model. For comparison, GPT-4 has 1.76 trillion, Claude 2.1 has between 520 billion to one trillion parameters, Llama 2 has 70 billion parameters, and Gemini possibly has 1.56 trillion parameters.
“There is currently nothing better than NVIDIA hardware for AI,” NVIDIA quoted Tesla and xAI Elon Musk, in its press release alongside eight other industry leaders, including Microsoft CEO Satya Nadella, Oracle CEO Larry Ellison, Alphabet CEO Sundar Pichai, Meta CEO Mart Zuckerberg, Amazon president and CEO Andy Jassy, Dell CEO Michael Dell, Google DeepMind cofounder and CEO Demis Hassabis, and OpenAI CEO Sam Altman.
Products based on the Blackwell architecture will be available later this year. GB200 will also be available on the NVIDIA DGX Cloud.
Source : SpieceWorks