Our monetization model is that with each one of our partners they rent a sandbox on DGX Cloud, where we work together, they bring their data, they bring their domain expertise, we bring our researchers and engineers, we help them build their custom AI. We help them make that custom AI incredible. Then that custom AI becomes theirs. And they deploy it on the runtime that is enterprise grade, enterprise optimized or outperformance optimized, runs across everything NVIDIA. We have a giant installed base in the cloud, on-prem, anywhere. And it’s secure, securely patched, constantly patched and optimized and supported. And we call that NVIDIA AI Enterprise. NVIDIA AI Enterprise is $4,500 per GP per year, that’s our business model. Our business model is basically a license.
Our customers then with that basic license can build their monetization model on top of. In a lot of ways we’re wholesale, they become retail. They could have a per — they could have subscription license base, they could per instance or they could do per usage, there is a lot of different ways that they could take a — create their own business model, but ours is basically like a software license, like an operating system. And so our business model is help you create your custom models, you run those custom models on NVIDIA AI Enterprise. And it’s off to a great start. NVIDIA AI Enterprise is going to be a very large business for us.
Operator: Your next question comes from the line of Stacy Rasgon of Bernstein Research. Your line is open.
Stacy Rasgon: Hi, guys. Thanks for taking my questions. Colette, I wanted to know if it weren’t for the China restrictions would the Q4 guide has been higher or are you supply-constrained in just reshipping stuff that would have gone to China elsewhere? And I guess along those lines you give us a feeling for where your lead times are right now in data center and just the China redirection such as-is, is it lowering those lead times, because you’ve got parts that are sort of immediately available to ship?
Colette Kress: Yeah. Stacy, let me see if I can help you understand. Yes, there are still situations where we are working on both improving our supply each and every quarter. We’ve done a really solid job of ramping every quarter, which has defined our revenue. But with the absence of China for our outlook for Q4, sure, there could have been some things that we are not supply-constrained that we could have sold, but kind of we would no longer can. So could our guidance had been a little higher in our Q4? Yes. We are still working on improving our supply on plan, on continuing growing all throughout next year as well towards that.
Operator: Your next question comes from the line of Matt Ramsay of TD Cowen. Your line is open.
Matt Ramsay: Thank you very much. Congrats, everybody, on the results. Jensen, I had a two-part question for you, and it comes off of sort of one premise. And the premise is, I still get a lot of questions from investors thinking about AI training as being NVIDIA’s dominant domain and somehow inference, even large model inference takes more and more of the TAM that the market will become more competitive. You’ll be less differentiated et cetera., et cetera. So I guess the two parts of the question are: number one, maybe you could spend a little bit of time talking about the evolution of the inference workload as we move to LLMs and how your company is positioned for that rather than smaller model inference. And second, up until a month or two ago, I never really got any questions at all about the data processing piece of the AI workloads.
So the pieces of manipulating the data before training, between training and inference, after inference and I think that’s a large part of the workload now. Maybe you could talk about how CUDA is enabling acceleration of those pieces of the workload. Thanks.
Jensen Huang: Sure. Inference is complicated. It’s actually incredibly complicated. If you — we this quarter announced one of the most exciting new engines, optimizing compilers called TensorRT-LLM. The reception has been incredible. You got to GitHub, it’s been downloaded a ton, a whole lot of stars, integrated into stacks and frameworks all over the world, almost instantaneously. And there are several reasons for that, obviously. We could create TensorRT-LLM, because CUDA is programmable. If CUDA and our GPUs were not so programmable, it would really be hard for us to improve software stacks at the pace that we do. TensorRT-LLM, on the same GPU, without anybody touching anything, improves the performance by a factor of two.
And then on top of that, of course, the pace of our innovation is so high. H200 increases it by another factor of two. And so, our inference performance, another way of saying inference cost, just reduced by a factor of four within about a year’s time. And so, that’s really, really hard to keep up with. The reason why everybody likes our inference engine is because our installed base. We’ve been dedicated to our installed base for 20 years, 20-plus years. We have an installed base that is not only largest in every single cloud, it’s in every available from every enterprise system maker, it’s used by companies of just about every industry. And every — anytime you see a NVIDIA GPU, it runs our stack. It’s architecturally compatible, something we’ve been dedicated to for a very long time.