Google Will Scrape Reddit Communities For AI Parts in $60 Million Deal: Report

Apr 11,2024
Google Will Scrape Reddit Communities For AI Parts in $60 Million Deal: Report

Reddit reportedly signed a $60 million deal with Google to allow its online communities to be scraped for AI training data, according to Reuters Thursday. Google will sift through millions of posts on Reddit, and train a large language model on Reddit’s threads. The content deal was originally reported by Bloomberg but did not identify the “large AI company” receiving the data.

Reddit is reportedly weighing an IPO with a $5 billion valuation, despite only bringing in $800 million in revenue last year. Reddit is not profitable but has a rich valuation because its online communities offer a perfect training ground for AI models. However, licensing out your user base’s thoughts and ideas is not always reciprocated well. The most popular subreddits went dark in protest last year after users took issue with the company charging for access to its application programming interface (API), first announced in April of 2023.

Reddit’s reported deal with Google is exactly what the platform has been looking for. Big Tech is hungry for data, and that has turned legacy news organizations, community forums, and even the University of Michigan into mere content farms. These deals, though upsetting to users, offer Reddit a path to profitability.

“The Reddit corpus of data is really valuable,” said Reddit CEO Steve Huffman to The New York Times in April. “But we don’t need to give all of that value to some of the largest companies in the world for free.”

But when Reddit started charging for API access, it didn’t just charge big companies, it also started charging small, independent researchers. This shift made it more difficult for Reddit’s moderators to manage their communities, and some argued it made for a worse experience for Reddit’s 800 million monthly active users.

“We believe that the longevity and success of this platform rest on preserving the rich ecosystem that has developed around it,” said Reddit moderators in a collective letter from last June. “The potential loss of these services due to the pricing change would significantly impact our ability to moderate efficiently, thus negatively affecting the experience for users in our communities.”

Reddit did not immediately respond to Gizmodo’s request for comment.

Apple was exploring $50 million AI deals with The New York Times, Condé Nast, and other news publishers in December. Shutterstock is also licensing its human-made content to OpenAI for training on its models. Twitter, Instagram, and YouTube have also become increasingly valuable in recent years, as they’re now seen as content gold mines.

The platform also introduced ads in recent years and made it impossible for users to opt out of seeing advertiser content in 2023. As Reddit becomes a public company, there’s a growing concern from users that management will hurt the thriving community forum it has built.

There’s also a bigger concern about how AI companies are licensing data. Content platforms are signing million-dollar licensing agreements with AI companies, but the actual people who created this content aren’t getting a thing. Meanwhile, AI threatens to replace content creators in the editorial, graphic design, and film industries.