DeepSeek: The Rise of a Chinese AI Startup

Lola Ti
0

DeepSeek: The Rise of a Chinese AI Startup

Introduction

In July 2024, a young Chinese entrepreneur—who we will get to know in a bit—was giving an interview to 36Kr, a media company specializing in financial and tech news.

The Context of OpenAI

That same month, the American company OpenAI, one of the leading forces in machine learning and large language models, was experiencing some of its brightest days. To put things in perspective, in July alone, approximately 1.7 billion users visited its crown jewel, ChatGPT.

OpenAI's Market Valuation

At the height of OpenAI’s market valuation, the young Chinese entrepreneur made a rather unusual statement during his interview with 36Kr. He said: “More investment does not necessarily lead to more innovation... OpenAI is not a god, nor can it always stay ahead.”

The World Economic Forum

Fast forward to today—January 2025. Just last week, between the 20th and 24th, the World Economic Forum was held in Davos, Switzerland, as it is every year. This time, the forum’s focus was on the importance of innovation and modern technology.

American Tech Giants' Consensus

According to Bloomberg, private discussions on the sidelines of the conference revealed that American tech giants had reached a consensus: the U.S. and its allies must build more data centers and ensure AI market regulations remain flexible, rather than rigid as they currently stand. Why? Because the U.S. wants to stay ahead of its number one competitor: China.

Ruth Porat's Statement

This sentiment was echoed in a statement by Ruth Porat, President, Chief Investment Officer, and Chief Financial Officer of Alphabet and Google. She noted that while American companies might currently be more than a year ahead of their Chinese counterparts in AI models, it’s uncertain whether the U.S. will be able to maintain this lead.

DeepSeek's Breakthrough

However, despite her measured tone, Porat’s statement now seems overly optimistic. On January 20, a Chinese AI startup called DeepSeek launched its latest open-source large language model, DeepSeek R1. This model has rivaled and even outperformed some of its American counterparts including ChatGPT.

Market Reaction

In just a few days, DeepSeek R1 dominated the global tech scene, catching U.S. investors off guard with its astonishingly low cost and a model size less than one-tenth of OpenAI’s systems. This unexpected breakthrough triggered a massive market sell-off. By Monday, American and European tech stocks collectively lost $1 trillion in value.

Impact on NVIDIA

Among the hardest hit was NVIDIA, which saw $589 billion wiped out, marking the largest single-day loss for any company in the history of the U.S. stock market. So, what happened, in short? As the Wall Street Journal put it: a bloodbath.

Who is the Entrepreneur Behind DeepSeek?

Remember that young entrepreneur we mentioned at the start? Well, he’s the founder of DeepSeek! The big questions now are:

  • Who is this young man?
  • What’s his story?
  • What exactly is DeepSeek?
  • Why are American tech giants so alarmed by it?
  • And why is the U.S. so worried about China’s rapid advances in AI?

The Unique Journey of Liang Wenfeng

Liang Wenfeng, the Chinese mathematics prodigy, has shaken the world in recent days through DeepSeek. Born in 1985 in Zhanjiang, a poor city in Guangdong province, Liang was the son of an elementary school teacher.

Education and Early Career

This background shaped him into a top-performing student—so much so that he began studying calculus on his own in middle school. Excelling through his education, he eventually earned a place at the prestigious Zhejiang University.

Founding Jacobi

As a student, Liang began developing AI algorithms capable of identifying stock patterns based on complex criteria. A math fanatic, he was convinced—years ahead of his time—that analyzing vast amounts of market data could uncover hidden trading patterns, paving the way for huge financial gains.

Entering the Stock Market

That’s why he and two former university classmates decided to enter the Chinese stock market in 2008, in the midst of the global financial crisis, despite having zero prior trading experience. In 2013, a few years after graduating, Liang founded his first venture, an investment firm called Jacobi, named after the German mathematician Carl Gustav Jacob Jacobi.

High-Flyer and Growth

Yet even as he built Jacobi, he continued his work with his two old friends, staying on his chosen path. The trio spent years experimenting with different trading strategies before finally launching High-Flyer in 2015—a Chinese quantitative trading fund built around a proprietary computing model that analyzed market data based on price and trading volume factors.

Incorporating Machine Learning

Just one year later, they integrated machine learning into the model. This development allowed the system to delve deeper into financial and investment data, enabling it to identify optimal investment paths. As a result, High-Flyer’s assets skyrocketed, reaching over 90 billion yuan ($12.4 billion) by 2021.

Setbacks and Side Projects

However, in the following years, the fund faced setbacks, losing about a third of its asset value, and today it manages approximately $8 billion. High-Flyer’s rapid growth inspired Liang to embark on a side project he considered the true future of his company and career.

A Bold Move in 2019

In 2019, Liang began purchasing thousands of Nvidia GPUs, leaving many wondering why he was doing it. Even some investors interested in partnering with him at the time found it bizarre. One investor recounted to The Financial Times that when he first met Liang, alongside a group of other investors, he found himself face-to-face with a peculiar man sporting an even more peculiar hairstyle.

Vision for AI Models

But instead of discussing investment strategies and High-Flyer’s performance, Liang spoke at length about building a computing cluster with 10,000 chips to train AI models. Although many didn’t understand what he was doing, by late 2022, when OpenAI launched ChatGPT, only a handful of Chinese companies had more than 10,000 advanced Nvidia chips—and High-Flyer was one of them. Just as Liang had planned.

Strategic Purchases and Government Regulations

Liang’s relentless pursuit of Nvidia’s chips became even more evident when he explained his reasoning in an interview with 36Kr. When asked why he was buying so many GPUs, he likened it to buying a piano, saying: “1. I can afford it. 2. I have a group of people who desperately want to play it.”

As Liang stockpiled Nvidia’s chips, the Biden administration imposed tighter restrictions on selling high-end computing chips to China, particularly those used for training AI models. Under these restrictions, U.S. companies wanting to sell chips to Chinese firms had to significantly reduce something called interconnect bandwidth—a term that refers to the speed at which data is transferred between chips.

Nvidia's Response

To comply with U.S. export regulations, Nvidia—the global leader in AI chip design—developed a China-specific AI chip in the same year: the H800. Although this chip technically adhered to Biden’s restrictions by reducing interconnect bandwidth, Nvidia found other ways to maintain high performance, ensuring that Chinese AI firms could still build powerful models.

Opportunity for DeepSeek

This created a months-long window of opportunity for Liang to purchase large quantities of H800 chips before the Biden administration caught on to what Nvidia had done and tightened regulations further—part of the broader U.S.-China tech rivalry. Nvidia’s workaround became one of the key foundations of DeepSeek, the AI company that the obsessive Chinese billionaire Liang launched in 2023 in Hangzhou—the same tech hub that houses Alibaba.

Focus on Chinese Talent

From the outset, Liang had a clear vision: to rely solely on Chinese talent. In his rare media appearances, he described DeepSeek as a “one-of-a-kind local company, where we develop Chinese talent ourselves.” Every employee and researcher at DeepSeek is Chinese, with PhD graduates from top institutions such as Peking University, Tsinghua University, and Beihang University instead of recruiting American experts.

As a student, Liang began developing AI algorithms capable of identifying stock patterns based on complex criteria. A math fanatic, he was convinced—years ahead of his time—that analyzing vast amounts of market data could uncover hidden trading patterns, paving the way for huge financial gains.

Entering the Stock Market

That’s why he and two former university classmates decided to enter the Chinese stock market in 2008, in the midst of the global financial crisis, despite having zero prior trading experience. In 2013, a few years after graduating, Liang founded his first venture, an investment firm called Jacobi, named after the German mathematician Carl Gustav Jacob Jacobi.

High-Flyer and Growth

Yet even as he built Jacobi, he continued his work with his two old friends, staying on his chosen path. The trio spent years experimenting with different trading strategies before finally launching High-Flyer in 2015—a Chinese quantitative trading fund built around a proprietary computing model that analyzed market data based on price and trading volume factors.

Incorporating Machine Learning

Just one year later, they integrated machine learning into the model. This development allowed the system to delve deeper into financial and investment data, enabling it to identify optimal investment paths. As a result, High-Flyer’s assets skyrocketed, reaching over 90 billion yuan ($12.4 billion) by 2021.

Setbacks and Side Projects

However, in the following years, the fund faced setbacks, losing about a third of its asset value, and today it manages approximately $8 billion. High-Flyer’s rapid growth inspired Liang to embark on a side project he considered the true future of his company and career.

A Bold Move in 2019

In 2019, Liang began purchasing thousands of Nvidia GPUs, leaving many wondering why he was doing it. Even some investors interested in partnering with him at the time found it bizarre. One investor recounted to The Financial Times that when he first met Liang, alongside a group of other investors, he found himself face-to-face with a peculiar man sporting an even more peculiar hairstyle.

Vision for AI Models

But instead of discussing investment strategies and High-Flyer’s performance, Liang spoke at length about building a computing cluster with 10,000 chips to train AI models. Although many didn’t understand what he was doing, by late 2022, when OpenAI launched ChatGPT, only a handful of Chinese companies had more than 10,000 advanced Nvidia chips—and High-Flyer was one of them. Just as Liang had planned.

Strategic Purchases and Government Regulations

Liang’s relentless pursuit of Nvidia’s chips became even more evident when he explained his reasoning in an interview with 36Kr. When asked why he was buying so many GPUs, he likened it to buying a piano, saying: “1. I can afford it. 2. I have a group of people who desperately want to play it.”

As Liang stockpiled Nvidia’s chips, the Biden administration imposed tighter restrictions on selling high-end computing chips to China, particularly those used for training AI models. Under these restrictions, U.S. companies wanting to sell chips to Chinese firms had to significantly reduce something called interconnect bandwidth—a term that refers to the speed at which data is transferred between chips.

Nvidia's Response

To comply with U.S. export regulations, Nvidia—the global leader in AI chip design—developed a China-specific AI chip in the same year: the H800. Although this chip technically adhered to Biden’s restrictions by reducing interconnect bandwidth, Nvidia found other ways to maintain high performance, ensuring that Chinese AI firms could still build powerful models.

Opportunity for DeepSeek

This created a months-long window of opportunity for Liang to purchase large quantities of H800 chips before the Biden administration caught on to what Nvidia had done and tightened regulations further—part of the broader U.S.-China tech rivalry. Nvidia’s workaround became one of the key foundations of DeepSeek, the AI company that the obsessive Chinese billionaire Liang launched in 2023 in Hangzhou—the same tech hub that houses Alibaba.

Focus on Chinese Talent

From the outset, Liang had a clear vision: to rely solely on Chinese talent. In his rare media appearances, he described DeepSeek as a “one-of-a-kind local company, where we develop Chinese talent ourselves.” Every employee and researcher at DeepSeek is Chinese, with PhD graduates from top institutions such as Peking University, Tsinghua University, and Beihang University instead of recruiting American experts.

—along with DeepSeek’s unconventional approach—stunned the industry. Why? Because for years, the AI industry operated under one fundamental belief: Developing AI requires massive amounts of advanced, incredibly expensive computing chips. This is precisely why no one expected serious competition from China, especially since Chinese companies have struggled to purchase high-end chips due to U.S. export restrictions.

DeepSeek's Innovative Approach

But DeepSeek took an unexpectedly brilliant approach. Lennart Heim, an AI researcher at the RAND Corporation, explained AI models using an analogy: Imagine ChatGPT as a librarian who has read every book in the library. When you ask it a question, it gives you an answer based on the knowledge it has gathered from all those books.

Challenges with Traditional AI Training

The problem with this approach? It’s long, expensive, and resource-intensive—requiring massive computing power, enormous amounts of data, and a huge electricity bill to train an AI model at that scale.

DeepSeek's Efficient Method

DeepSeek, however, said: Stop. We’re not taking the hard road. They neither had the resources nor the computing power for a traditional AI training approach—so they played it smart. What did they do?

Lennart explained that instead of making the AI model—akin to a librarian—read every single book, DeepSeek made it a highly skilled librarian capable of efficiently searching for the most relevant book to answer a given question. In other words, rather than reading all the books, it only reads the one that contains the key information.

Mixture of Experts Model

Alongside this smart approach, DeepSeek also employed another technique as part of its solution: the Mixture of Experts model. What does that mean? Instead of having a single librarian capable of searching across every topic, they created a team of specialized librarians. One librarian is dedicated to cooking, another to physics, and another to specific types of literature, and so on.

Reducing Computational Strain

This method reduced the strain on computing chips, as they no longer had to process everything simultaneously, preventing system overload. By applying these strategies with remarkable intelligence, DeepSeek managed to train its AI models at just a fraction of the cost of its American competitors.

Speculation and Reactions

The shockwave DeepSeek sent through the U.S. AI industry has fueled a lot of speculation over the past few days without any concrete evidence to back it up. Some American experts have claimed that this was a message from the Chinese government to the Trump administration, proving that U.S. technological sanctions against China are ineffective.

Reality Over Theories

Others have taken the theory even further, suggesting that DeepSeek is part of a Chinese conspiracy to dismantle the American tech industry. In reality, things are often much simpler than conspiracy theories suggest.

Future of DeepSeek

And, of course, DeepSeek’s models are not without their flaws. After returning from their Chinese New Year break, Liang and his team announced that they would immediately begin work on the next version of DeepSeek. Meanwhile, one key question is now being asked by some of the top AI experts: Can DeepSeek sustain its competitive edge as the industry advances and spending increases?

Challenges Ahead

This question is crucial because we shouldn’t jump to the conclusion that major U.S. tech companies are simply wasting money or that DeepSeek is on the verge of overtaking them. So why might that be an exaggerated scenario? Because operating advanced AI models will continue to be extremely expensive even after training. Furthermore, major companies like Google and OpenAI will keep investing heavily in AI development to maintain their dominance in the market.

Conclusion

The key takeaway here is that DeepSeek has exposed the flaws in the “bigger is better” doctrine. It has demonstrated, in practice, that smaller AI models—when trained correctly and with innovative, smart techniques like the ones the Chinese team employed—can rival or even surpass the performance of much larger models.

Encouraging Competition

This means that AI companies can achieve significant advancements with much lower investments than what was previously common in the industry. As a result, investors may be encouraged to fund emerging AI startups, fostering greater competition among companies rather than allowing Silicon Valley giants to dominate cutting-edge technologies simply because they can afford massive spending.

I won’t keep you much longer—AI is a vast topic that would take multiple episodes, not just one. In fact, we might do another episode soon on the intense competition between China and the U.S. for AI dominance, as both see it as the key to shaping the future of global technology.

Post a Comment

0Comments

Hi Please, Do not Spam in Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!