Meta (Facebook, Instagram, etc.) is alleged to have used pirated versions of copyrighted books to train its AI models, according to plaintiffs in a recently filed lawsuit in the United States, as reported by TechCrunch. This lawsuit is based on internal communications from Meta where Mark Zuckerberg allegedly approved the use of the LibGen database, a vast online archive of books and articles, most of which is considered pirated.
The CEO reportedly decided to proceed despite warnings from his team about the legally questionable origins of this digital material, not for ethical reasons, but because it could impact negotiations between the company and entities beginning to establish legislation on content usage for training models. The lawsuit cites concerns that media coverage suggesting the use of a known pirated dataset like LibGen could harm their negotiation capabilities with regulators.
This situation exemplifies a troubling dynamic for AI giants, who rely on vast amounts of content, often sourced directly from the internet, for training their models. While copyright laws vary by region, much of this content is protected by intellectual property laws, and theoretically, authors should be able to refuse its use or demand compensation. However, in practice, this is rarely the case, as companies like Meta, Google, and OpenAI exploit the legal gray areas surrounding generative AI to continue harvesting content without recompense.