AI Chips War: Why Tech Giants Are Ditching NVIDIA’s Chips

AI Chips War: Why Tech Giants Are Ditching NVIDIA's Chips

Have you ever considered how expensive it is really to operate the AI tools you use in your daily life? The answer is staggering. It may require over $100 million to train a single model that is on the top notch, and it takes a significant amount of energy to run that model. During ten years, it was possible with the hardware of only one company: NVIDIA. However, there is a monumental change that is being made. The largest technological giants are currently developing their own brain.

What is the reason Google, Amazon, Microsoft, and Meta should engage in the astronomically costly and complicated project of creating their own chips? It is not a battle of technical specifications. It is a battle sparked off by the need to control the future of artificial intelligence itself. The days of universal AI chip are behind.

A Golden Cage: The NVIDIA Bottleneck

Let’s be clear. The unquestioned leaders of the AI boom are the GPUs offered by NVIDIA. Their software development platform, CUDA, was the standard AI developer language. This formed a de facto monopoly. That is why, all people, startups and governments, were locked.

Not only are we not purchasing a chip, but we purchase an entire ecosystem,” a cloud architect of one of the fortune 500 companies recently told me. “The dependency is absolute.”

This dependence was very costly. The need in the new H100 chips increased the costs dramatically. Waiting lines went on and on. This scenario caused an enormous bottleneck. Exploding cloud bills made tech giants realize that their development was stranded to the roadmap of another business. They needed an escape plan.

The Great Unbundling: Why Build Beats Buy

So, what is the motivation of this exodus? The causes are as miscellaneous as the substance of a semiconductor. First, there is the pure quest of performance. Everything is designed with a generic GPU. A specific task, such as a trillion ad serves or massive language model training can be carefully designed, using a custom chip. The efficiency increase is titanic.

Second, and, perhaps, equally important, cost. In-house design allows the companies to eliminate a large profit margin of NVIDIA. In the case of the operations that they operate on the scale, this means billions saved in the long run. Lastly, there is the strategic independence. Why outsource your most important engine in the AI game of high stakes?

Inside the Secret Silicon Labs

Each of the tech giants is following a somewhat different approach, which corresponds to a different company DNA.

Google, The Pioneer: Google became the first company to propose this trend in 2016 with the Tensor Processing Unit (TPU). They are on their fifth generation TPUs. They are the driving force of search, YouTube, and, most importantly, its Gamini model. This is a vertical integration that provides them with a formidable advantage.

Amazon AWS, The Integrator: Amazon plans to take over the cloud stack. They also provide their own cheaper and faster version of NVIDIA on AWS with their own Inferentia and Trainium chips. Why allow a partner to slice your cloud revenue? They are interested in having everything in the same roof.

Microsoft, The Pragmatist: OpenAI forced Microsoft to partner with it. In order to serve the massive requirements of ChatGPT and Copilot, they created the Maia 100 AI chip. It is customized to their cloud platform and models of their partner, which are very large. This is pragmatism in its most raw and bare form.

Meta, The Scale Specialist: The world of Meta is constructed on the scale. They have a single purpose in designing their own MTIA v2 chip that is to drive their indefatigable engines of recommendation. To provide content to the content machines of Facebook and Instagram, a personalized solution is not a luxury; a matter of survival.

The Ripple Effect on AI Tools

The hardware revolution has a direct effect on the AI tools that are at your and my disposal. Development of special chips gives rise to more powerful applications that are more accessible. Take an example of a startup that would like to optimise a model. Instances of Trainium owned by Amazon can reduce the cost of their training by 50 percent in comparison to generic ones.

One of the developers of a recent tech conference told a story “where inference time on our model decreased by 50, when switching to TPUs made by Google. That is what makes the difference between a cranky demonstration and an organic user experience.”

This is essential democratization. Reduced costs and increased performance are trickled down and a new cycle of innovation is born. The second disruptive application may be developed since the creators of the app could afford the compute.

An Expert’s View: The Vertical Integration Gamble

The last person I interviewed was a semiconductor analyst and experienced over twenty years, Dr. Anya Sharma. She has presented this as not a chip war, but a philosophical change. The 2010s were characterized by the power of software. We are today observing a resurgence of vertical integration. The silicon controlling companies will end up dominating the speed of the AI innovation, she explained.

She got it in comparison with the automotive industry. Not every car company has to manufacture its own engines, but Tesla did to make them electric-powered. In the same manner, these tech giants are developing their own engines since off-the-shelf solutions are not able to match their custom AI fuel.

The Immense Hurdles Ahead

It is a difficult road to follow. The cost of the money outlay is staggering. The cost of constructing a modern chip is more than 500 million. And that is, even before production. Then there’s the talent war. The number of world-class chip designers is limited and they are in demand by all of them.

Software is perhaps the greatest challenge. The moat possessed by NVIDIA is the CUDA, which is incredibly hard to cross. None of these companies can create as intuitive and wide-used a software ecosystem? For now, the answer is unclear. This division may also introduce a nightmare to developers who are forced to write code that supports a variety of, incompatible platforms.

A New Chapter in Computing

We are in the dawn of the new age of architecture. The previous decades were characterized by the CPU and the GPS. The ASIC is now taking the center stage in AI . This isn’t a minor trend. It is a paradigm shift in the computing environment.

My strong opinion? The champion of the AI decade will not be the firm that has the most superior algorithm. It will become the one which best marries its software ambition with its silicon reality. The future generation of the world-changing AI tools will not emerge in code, but in clean rooms of purpose-built chip laboratories. Our AI+ future brains are being in-sourced.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments