We're here in sunny San Jose California for the return of an event that's been a long-time coming: NVIDIA's in-person GTC. The Spring 2024 event, NVIDIA's marquee event for the year, promises to be a big one for NVIDIA, as the company is due to deliver updates on its all-important datacenter accelerator products – the successor to the GH100 GPU and its Hopper architecture – along with NVIDIA's other professional/enterprise hardware, networking gear, and, of course, a slew of software stack updates.

In the 5 years since NVIDIA was last able to hold a Spring GTC in person, a great deal has changed for the company. They're now the third biggest company in the world, thanks to explosive sales growth (and even further growth expectations) due in large part to the combination of GPT-3/4 and other transformer models, and NVIDIA's transformer-optimized H100 accelerator. As a result, NVIDIA is riding high in Silicon Valley, but to keep doing so they also will need to deliver the next big thing to push the envelope on performance, and keep a number of hungry competitors off their turf.

Headlining today's keynote is, of course, NVIDIA CEO Jensen Huang, whose kick-off address has finally outgrown the San Jose Convention Center. As a result, Huang is filling up the local SAP Center arena instead. Suffice it to say, it's a bigger venue for a bigger audience for a much bigger company.

So come join the AnandTech crew for our live blog coverage of NVIDIA's biggest enterprise keynote in years. The presentation kicks off at 1pm Pacific, 4pm Eastern, 20:00 UTC.


04:04PM EDT - This is Ryan. Apologies for the late start here folks, it's been an interesting time getting everyone down to the floor and settled. (I'm told the capacity of SAP Arena for concerts is 18,500 people. With this floor layout, there may be closer to 20,000 here)

04:04PM EDT - So without further ado, here's Jensen!

04:04PM EDT - "This is not a concert. You have arrived at a developers conference"

04:04PM EDT - "I sense a very heavy weight in the room all of a sudden"

04:05PM EDT - Jensen is recapping the many technologies to be seen here. "Even artificial intelligence"

04:06PM EDT - Showing off a iist of exhibitors. It's a very big list of names

04:07PM EDT - "The computer is the single most important instrument in society today"

04:07PM EDT - How did we get here? Jensen drew a comic flow chart

04:09PM EDT - CUDA became a success... eventually. A bit later than Jensen would have liked

04:09PM EDT - 2023: Generative AI emerged, and a new industry begins

04:09PM EDT - "The software never existed before, it is a brand new category"

04:10PM EDT - Comparing generative AI to the industrial revolution and the age of energy

04:11PM EDT - "We're going to talk about how we're doing computing next"

04:11PM EDT - As well as software and applications in this industry. And how to start preparing for what's next

04:11PM EDT - Everything we'll be shown today is a simulation, not an animation

04:12PM EDT - "It's being animated with robotics. It's being animated with artificial intelligence"

04:12PM EDT - "Everything is homemade"

04:13PM EDT - Rolling a demo reel of GPU-accelerated applications

04:13PM EDT - Warp, PhysX Flow, Photoshop, and more

04:15PM EDT - NVIDIA's GPUs are only worth using because of the software written for them. So NVIDIA has always made it a point to showcase what the software development community has been up to

04:15PM EDT - "General purpose computing has run out of steam"

04:15PM EDT - Need new ways to keep growing computing. Keep consuming computing

04:16PM EDT - It's not about driving down the cost, it's about driving up the scale

04:16PM EDT - Jensen would like to simulate everything they do in digital twin virtual environments

04:17PM EDT - NVIDIA will have partners joining them today

04:17PM EDT - Jensen will be announcing several important partnerships

04:17PM EDT - Ansys

04:18PM EDT - Sysopsys. NVIDIA's very first software partner. Literally

04:18PM EDT - Accelerating computational lithography

04:18PM EDT - TSMC is announcing today that they're going into production with cuLitho

04:19PM EDT - Cadence, the EDA tool maker, is joining Club GPU as well

04:20PM EDT - Going to connect Cadence's digital twin platform to Omniverse

04:20PM EDT - Omniverse will be the fundamental operating system for digital twins

04:20PM EDT - Now recapping the history of large language models, and the hardware driving them

04:21PM EDT - Doubling the parameter count requires increasing the token count

04:21PM EDT - 1.8T parameters is the current largest model. This requires several billion tokens to train

04:22PM EDT - Even a PetaFLOP GPU would take 30 billion seconds to train that model. That's 1000 years

04:22PM EDT - "What we need are bigger GPUs" "Much much bigger GPUs"

04:22PM EDT - The answer is to put a whole bunch of GPUs together

04:23PM EDT - As well as developing technologies like NVLink and tensor cores

04:23PM EDT - To help the world build these bigger systems, NVIDIA has to build them first

04:24PM EDT - "We need even larger models"

04:26PM EDT - "We're going to have to build even bigger GPUs"

04:26PM EDT - And here's Blackwell. "A very, very big GPU"

04:27PM EDT - Named after David Backwell, the mathematician and game theorist

04:27PM EDT - Now rolling a video

04:27PM EDT - Two dies on one package, full cache coherency

04:27PM EDT - 208B transistors

04:27PM EDT - 192GB HBM3E@8Gbps

04:27PM EDT - 1.8TB/sec NVLink bandwidth per chip

04:28PM EDT - Available as an accelerator and as a Grace Blackwell Superchip

04:28PM EDT - NVLink 5. Which comes with a new switch chip

04:29PM EDT - And NVDIA is building a rack-scale offering using GB200 and the new NVLink opertions, GB200 NVL72

04:29PM EDT - NVLink 5 scales up to 576 GPUs

04:29PM EDT - "Blackwell is not a chip. It's the name of a platform"

04:30PM EDT - Holding up Blackwell next to a GH100 GPU

04:30PM EDT - "It's okay, Hopper"

04:30PM EDT - "You're very good. Good girl"

04:31PM EDT - 10TBps link between the dies

04:31PM EDT - No memory locality issues or cache issues. CUDA sees it as a single GPU

04:32PM EDT - The chip goes into two types of systems. B100, which is designed to be drop-in compatible with H100/H200 HGX systems

04:33PM EDT - Jensen showing off Blackwell boards... and trying not to drop them

04:33PM EDT - GB200. 1 Grace CPU + 2 Blackwell GPUs (4 GPU dies)

04:33PM EDT - GB200 is memory coherent

04:34PM EDT - "This is s miracle"

04:34PM EDT - NVLink on top, PCIe Gen 6 on the bottom

04:34PM EDT - And that's Grace Blackwell

04:34PM EDT - "But there's more!"

04:35PM EDT - "We need a whole lot of new features"

04:35PM EDT - Second generation transformer engine

04:35PM EDT - Automaticaly and dynamically recast numerical formats to a lower precision when possible

04:36PM EDT - AI is about probability. Need to figure out when you can use lower precisions and when you can't

04:36PM EDT - Support down to FP4

04:36PM EDT - 5th generation NVLink

04:38PM EDT - Jensen is also pushing the importance of reliability. No massive cluster is going to stay up 100% of the time, especially running weeks on end

04:38PM EDT - So NVIDIA has added a RAS engine that can do a full, in-system self-test of a node

04:39PM EDT - Security remains a popular subject as well. Protecting the results of a model training, protecting the model itself. Which means supporting full speed encryption throughout

04:40PM EDT - Blackwell also adds a line decompression engine that can sustain 800GB/sec of transfers

04:40PM EDT - New: FP4 support. FP6 as well

04:41PM EDT - Meanwhile FP8 performance is 2.5x that of Hopper

04:41PM EDT - FP4 gains are even greater, since Hopper can't process FP4 natively

04:42PM EDT - "The future is generative"

04:43PM EDT - 5x the inference/token generation ability of Hopper

04:44PM EDT - But why stop there?

04:44PM EDT - Meanwhile there's also the new NVLink chip. 50B transistors, built on TSMC 4NP

04:45PM EDT - Connecting GPUs to behave as one giant GPU

04:46PM EDT - Which helps NVIDIA build the DGX GB200 NVL72

04:46PM EDT - 18 rack units of GB200 nodes, each node with 2 GB200s

04:46PM EDT - 9 rack units of NVSwitches

04:47PM EDT - This gives the GB200 NVL 720 PFLOPS (sparse) of FP8 throughput

04:47PM EDT - 1 EFLOPS of FP4 in one rack

04:48PM EDT - 5000 NVLink cables. 2 miles of cables

04:48PM EDT - And those are all copper cables. No optical transceivers needed

04:48PM EDT - That saved 20kW to be spent on computation

04:50PM EDT - Weight: 3000 pounds (prime, many more pounds)

04:51PM EDT - Traijning GPT-MoE-18.T would take 90 days on a 8000 GPU GH100 system consuming 15W

04:51PM EDT - GB200 NVL72 can do it on 2000 GPUs with 4MW of power

04:52PM EDT - Inference of LLMs is a challenge due to their size. They don't fit on one GPU

04:54PM EDT - Throughput is everything.

04:57PM EDT - Now looking at Blackwell vs Hopper

04:58PM EDT - And this is where FP4 and NVLink switch come in

04:59PM EDT - "This ability is super, super important"

05:00PM EDT - NVIIDA started with two customers and has many now:

05:02PM EDT - AWS, Google/GCP, Microsoft/Azure, Oracle are all on board

05:04PM EDT - Now for a demo of Wistron, one of NV's partners, who is building a digital twin of their factory using NVIDIA's technology

05:05PM EDT - The digital twin allows them to test operations and look for ways to improve

05:05PM EDT - And that's Wistron

05:06PM EDT - "That's what happens when you don't reherse"

05:07PM EDT - Recapping Alexnet in 2012, identifying cats

05:08PM EDT - AI is changing how software is written. Don't write code to teach a computer to identify a cat, create an AI model and train that to ID cats

05:09PM EDT - "Anything you can digitize, so long as there are patterns, you can learn from it"

05:10PM EDT - (We've now segued to software)

05:11PM EDT - Now on to Earth-2, NVIDIA's simulation for AI-powered weather forecasting and climatology

05:12PM EDT - CoreDiff to improve the resolution of weather models from 25km to 2km

05:12PM EDT - Generative supersampling for weather prediction

05:13PM EDT - For Taiwan right now, coming to the rest of the world later. Working with The Weather Company

05:14PM EDT - Now on to healthcare and NVIIDA

05:15PM EDT - Now rolling another video for computational drug testing

05:16PM EDT - BioNEMO

05:16PM EDT - NVIDIA has acess to a massive number of models across all fields

05:16PM EDT - But they think they're still too hard for companies to use

05:17PM EDT - "Inference is an extraordinary computation problem"

05:17PM EDT - "We're going to invent a new way to receive and operate software"

05:17PM EDT - NVIDIA Inference Microservice. A NIM

05:18PM EDT - A pre-trained model packaged and optimized to run across NVIDIA's install base

05:18PM EDT - Packaged with all the software needed to run it. CUDA libraries, APIs, etc

05:19PM EDT - Basically containerized AI software packages that are optimized for NV GPUs and come with a simple API to access them

05:19PM EDT - NIMs are getting their own landing page: https://ai.nvidia.com

05:19PM EDT - The future will be assembling a team of AIs to handle a task

05:21PM EDT - NVIDIA already has NIMs running across the company

05:21PM EDT - Including a chatbot for building chips. Co-designers with NVIDIA's engineers

05:22PM EDT - NVIDIA ChipNeMo, based on Llama 2 70B

05:22PM EDT - Asking it what a CTL is - Compute Trace Library

05:24PM EDT - "We are effectively an AI foundry"

05:24PM EDT - Comparing NVIDIA to TSMC

05:24PM EDT - Three pillars: NIMs, microservices, and the DGX cloud

05:27PM EDT - "The enterprise IT industry is sitting on a goldmine"

05:28PM EDT - They're sitting on a lot of data that could be turned into co-pilots

05:30PM EDT - "When you're ready to run these AI chatbots, you'll need an AI factory"

05:30PM EDT - Plugging partner Dell

05:31PM EDT - Now how to get AI to understand the physical world?

05:32PM EDT - It will take 3 computers. A traditional AI system, an OVX Omniverse system, and an AGX system for robitcs

05:35PM EDT - Discussing how all of these systems will interact. Ultimately robotics need a simulated environmnet to be trained in

05:35PM EDT - Video real time. All of this is running in real-time

05:36PM EDT - Demoing robots fed with sensor data, being controled by a higher AI

05:37PM EDT - "We've made Omniverse a lot easier to access"

05:38PM EDT - NVIDIA really wants to sell industrial customers on Omniverse Cloud

05:39PM EDT - NVIDIA has been partnering with Siemens on this

05:40PM EDT - Siemens is connecting their crown jewel accelerator to Omniverse

05:44PM EDT - Announcing today: Omniverse Cloud streams to Apple's Vision Pro headset

05:45PM EDT - Now on to automotive

05:45PM EDT - At the beginning of next year, NV will be shipping in Mercedes and other brands

05:46PM EDT - Drive Thor: World's only ASIL-D system that can run AI

05:46PM EDT - BYD will be adopting Drive Thor

05:47PM EDT - NVIDIA's Jetson platform for robotics is getting an update as well

05:47PM EDT - Jetson Orin

05:47PM EDT - New SDK to go with it: Isaac Perceptor

05:48PM EDT - Vision reconstruction and depth perception

05:49PM EDT - Plugging use of NV technologies in robotic arms, as well. And all the software they provide to help set this up

05:50PM EDT - And Jetson Thor? For humanoid robotics

05:50PM EDT - NVIDIA is making a new set of models and a software stack to teach humanoid robots

05:51PM EDT - Groot, which runs on Thor

05:51PM EDT - Now rolling a video on humanoid robotics/Groot

05:53PM EDT - Project Groot: A general purpose foundational model

05:54PM EDT - Learn human movement by observing them

05:54PM EDT - Powered by Jetson Thor platform

05:55PM EDT - And Jensen is lining up with robots

05:55PM EDT - General Robotics 0 0 Three (GROOT)

05:56PM EDT - And some other Jetson-based robots have wandered on stage

05:56PM EDT - Learned to walk in Isaac Sim

05:56PM EDT - Disney robots, it seems

05:57PM EDT - Pro tip: don't use "stop" with robots

05:57PM EDT - Now for the recap of today's keynote

05:58PM EDT - A new industrial revolution, powered by generative AI, with Blackwell at its core

05:58PM EDT - New computer creates new kind of software, which means new means of distributing that software. NIMs

05:59PM EDT - And in the world of robotics, all of these systems need a platform, and that is Omniverse

06:00PM EDT - This is what we announced to you today

06:00PM EDT - HGX B100, NVLink Switch, GB200 Superchip Compute Node, Quantum X800 Switch, Spectrum X800 Switch

06:01PM EDT - "One more thing to show you"

06:01PM EDT - Rolling a final video

06:03PM EDT - "Powering the new era of computing: NVIDIA"

06:03PM EDT - And that's a wrap for us. Our initial Blackwell story is up now, with more details to come once I can get more bandwidth

POST A COMMENT

10 Comments

View All Comments

  • GeoffreyA - Monday, March 18, 2024 - link

    Hmm, I wonder what Mr. Nobody-needs-to-learn-programming-any-more has got to say... Reply
  • GeoffreyA - Monday, March 18, 2024 - link

    "To help the world build these bigger systems, Nvidia has to build them first"

    Our Benefactor, Nvidia.
    Reply
  • Dante Verizon - Monday, March 18, 2024 - link

    A lot of AI-related propaganda that I don't care about, but it was surprising to learn that they're still using 4nm and not 3nm. Hmm. What would have happened? Apple? Reply
  • stadisticado - Tuesday, March 19, 2024 - link

    Yield. Its a full reticle die. Even if N3 is healthy, it might not be healthy enough to make the unit economics work. Reply
  • LordSojar - Tuesday, March 19, 2024 - link

    Propaganda is the keyword that you aren't paying close attention. It's okay... you'll get it eventually. Reply
  • Terry_Craig - Monday, March 18, 2024 - link

    The performance shown FP8 and below is with sparsity, which is why such big and pompous numbers are announced in the slides. Ugly. The actual performance is half of it lol Reply
  • TomWomack - Monday, March 18, 2024 - link

    I think from the photo it’s nine rack units of switches and 18 units of GB200 nodes, rather than nine and eighteen whole racks! Odd to have ten nodes above the switches and eight below. Reply
  • Ryan Smith - Monday, March 18, 2024 - link

    Correct. That's 18 RUs of GB200 nodes, and another 9 RUs of NVSwitches. Reply
  • James5mith - Tuesday, March 19, 2024 - link

    "04:51PM EDT - Traijning GPT-MoE-18.T would take 90 days on a 8000 GPU GH100 system consuming 15W"

    Now THAT is the kind of efficiency that nVidia should be striving for. lol
    Reply
  • tomjames1966 - Tuesday, March 19, 2024 - link

    With the data rates coming from these GPUS (10Tbps?!) aren't these things going to require a metric tonne of 800G transceivers? Or will 400G cut it? Reply

Log in

Don't have an account? Sign up now