All Insights

January 31, 2025

DeepSeek Causes Deep Sink

By: Aaron Wall, CFA
Partner, Portfolio Manager

Last week, we discussed the Stargate Project, a $500-billion AI project led by multiple mega-cap tech companies. The tech sector, specifically semiconductor-related stocks, experienced a significant rout on Monday tied to a new AI company in China—DeepSeek. We thought it would be useful to review the current state of AI technology and the potential impact of these new developments.

How Did We Get Here?

This story begins in the late months of 2022. As a result of continued tensions between the U.S. and China, the Biden administration issued its first of three sets of broad export controls intended to limit China’s access to advanced U.S. semiconductor chips. The timing of these controls also coincided with the initial AI revolution—OpenAI’s ChatGPT was released shortly after the first round of export controls was announced. The industry moved quickly to embrace the push towards AI, and these export controls swiftly hampered China’s ability to stay competitive.

Semiconductors became the hottest commodity as major technology companies began to expend significant capital on building digital infrastructure to support the forecasted demand. These companies, dubbed “hyperscalers,” began hoarding semiconductor chips and pouring money into data centers (see last week’s edition of Investment Insights for more background on this trend).

Why Are These Chips So Important To The AI Industry?

They are critical components in the training process for large language models (LLMs) like ChatGPT. The process of constructing an LLM is rather straightforward. We gather a significant pool of data and configure our model to process that data. Then, the training process begins. This is where the model “learns” the data. Finally, there is some fine-tuning as the model is prepared for release.

An analogy to simplify this process is to think about the final LLM as a librarian. We gather all the books in our library, then the librarian reads them. As a result, the librarian is able to answer any question we can ask that could be answered by the books in the library. The process is certainly more complex, but it should serve as a good comparison.

OpenAI holds one key principle above all else when constructing their LLMs: the law of scaling. Essentially, more data and more power should improve the training process and lead to more accurate LLMs.

Gathering more clean data to train will be a challenge. Soon, we will hit a plateau on the available data across the internet. Power, however, will not plateau. This is why the chips have been in such high demand, because of the belief that more power in the training process should lead to a more effective training process and a smarter LLM. Typical LLMs being used today require far less processing power once they are fully operational.

Said another way, based on the way these models are configured today, the amount of power needed to train the LLMs dwarfs the amount needed to operate them. Given that we are on the frontier of this new technological advancement, hyperscalers are moving quickly to secure processing power via advanced semiconductors to begin to train their own LLM models.

Now that we have a sense for the importance of processing power, we can unpack why DeepSeek’s reveal caused so much turbulence earlier in the week.

DeepSeek Causes Deep Sink

DeepSeek is a Chinese company that initially started out as a quantitative hedge fund. After the AI boom, it pivoted into developing LLMs. The release of its AI model R1 sent waves around markets. The company claimed that its training budget was less than $6 million, which has been widely questioned and could be a misleading figure. Importantly, DeepSeek said that all of its code is available as open source. OpenAI, on the other hand, guards their code as a trade secret.

The initial conclusions were widespread, with some calling this a “Sputnik moment.” DeepSeek was able to create an LLM on par with U.S. competition using far less resources and made its code available for anyone to access. The scrappy upstart with limited access to resources was able to create a product on par with the major players at a fraction of the cost.

The natural reaction was to question if these hyperscalers needed to invest such large sums of money into data centers and advanced semiconductors. Also, did DeepSeek just share the secret sauce to allow other smaller players to compete with the large technology companies?

How Will This Impact AI Moving Forward?

There are two main reasons to pump the brakes on any claims that this innovation will disrupt the current AI infrastructure. First, there is healthy skepticism that all of the claims from DeepSeek are accurate. Some are speculating that a stockpile of advanced semiconductors did in fact aid in the training of the model.

In an interview with CNBC last week, Alexandr Wang, the CEO of Scale AI, stated, “You know the Chinese labs, they have more H100s than, than people think… my understanding is that DeepSeek has about 50,000 H100s, which they can't talk about obviously because it is against the export controls that United States has put in place.” It is hard to know who to believe at this stage, but the fact that there is speculation is important to acknowledge.

Second, processing power is additive to digital infrastructure for other uses outside of just training. Recall that we mentioned using models requires much less processing power than training them.

Many believe the next evolution in the LLMs is building in the ability for them to think and iterate while answering questions. This is a process that would increase the time it takes to answer questions but is also believed to increase the accuracy of the answers. The idea is that if the LLM iterates multiple steps to generate an answer, it will have the opportunity to correct itself instead of blindly sharing an incorrect answer.

If this is indeed where the industry heads, the processing power required to operate the models will increase.

DeepSeek’s innovation should not be understated, but we believe that it ultimately is a beneficial step in the early frontier of AI. Although aspects of the training data and cost estimates may be under scrutiny, some of the actual code within the R1 model contains innovative thinking that is now available for everyone to see, adapt and iterate on.

We expect innovation within AI to be a dominant theme over the next cycle, and we would be careful not to overreact to moments like DeepSeek with extreme exuberance or pessimism.

Log Into the Fidelis Capital
Client Portal

X