Big Tech’s trapped in a glass house on AI data snatching

Having exploited user data for years, the tables are turning as Big Tech firms grab it from one another.

Large tech players racing to build more capable AI models now have fewer places to look for data on the public web. PHOTO: AFP
New: Gift this subscriber-only story to your friends and family

A few weeks ago, the chief technology officer of OpenAI was asked if her company had used YouTube videos to train its artificial intelligence (AI) systems. First, she gave a blank stare. Then there was a grimace. Finally, Ms Mira Murati gave an answer that avoided the messy and furtive world she and other tech companies were operating in: “Actually, I’m not sure about that.”

According to a New York Times report, OpenAI in fact had trained its AI on “more than one million hours of YouTube videos” using a speech recognition tool called Whisper. All the conversational text from the transcriptions was used to train GPT-4, the flagship large language model that underpins ChatGPT.

Already a subscriber? 

Read the full story and more at $9.90/month

Get exclusive reports and insights with more than 500 subscriber-only articles every month

Unlock these benefits

  • All subscriber-only content on ST app and straitstimes.com

  • Easy access any time via ST app on 1 mobile device

  • E-paper with 2-week archive so you won't miss out on content that matters to you

Join ST's Telegram channel and get the latest breaking news delivered to you.