Companies relying on AI may soon be in trouble

By: Trademagazin Date: 2024. 08. 09. 11:28

The world of artificial intelligence (AI) is increasingly challenged by the fact that real, human-generated data is starting to exhaust itself. According to a Business Insider article, synthetic data could be a solution, although many experts are skeptical. Tech giants such as OpenAI and Google have so far relied on text, video, and other media content available on the Internet to train their large-scale language models (LLMs). However, the supply of real data is already shrinking, and text data is predicted to run out by 2028.

As traditional data sources are exhausted, more and more companies are turning to synthetic data. They are generated by artificial intelligence systems and based on real data. Nvidia and Chinese tech giant Tencent have also developed models that can create artificial datasets. However, related research warns that excessive use of synthetic data can cause irreversible errors and model collapse.

Models fed with AI-generated data can produce gibberish, according to a study by a research team from Oxford and Cambridge. According to Gary Marcus, an AI analyst at New York University, synthetic data cannot provide real reasoning or design. Jathan Sadowski, a senior analyst at Monash University, has dubbed the phenomenon “Habsburg AI”, referring to the Austrian dynasty that was destroyed by domestication.

AI developers like OpenAI and Google pay huge sums of money for data from Reddit and other news sites to feed their models with fresh data. When real data runs out, companies must rely on synthetic data. According to some research, hybrid databases, which contain a mixture of real and synthetic data, can provide a more stable basis for models.

Tags: , ,

Related news