You are currently viewing Unsealed Files Reveal How Anthropic Quietly Scanned and Destroyed Millions of Books to Train AI

Unsealed Files Reveal How Anthropic Quietly Scanned and Destroyed Millions of Books to Train AI

  • Post author:
  • Post last modified:February 2, 2026

Sharing articles

Unsealed Files Reveal How Anthropic Quietly Scanned and Destroyed Millions of Books to Train AI

Newly unsealed legal documents show that Anthropic, one of the leading artificial intelligence startups, carried out a secretive initiative called “Project Panama” to acquire, cut apart, and digitally scan millions of books in an effort to train its AI models — including the chatbot Claude. The files reveal that the plan was intended to give the AI “high-quality text” so it could learn to write well — but it sparked controversy over ethics, copyright, and the future of creative work in the AI era.

Experts and industry observers now warn that this revelation could reshape how AI companies collect data, how authors and publishers are compensated, and how future legislation may govern AI training datasets. Anthropic has already agreed to a major settlement in related legal battles.

Unsealed Files Reveal How Anthropic Quietly Scanned and Destroyed Millions of Books to Train AI

Why this matters now: With AI models rapidly integrating into everyday tools — from writing assistants to search engines — the source and treatment of training data are under intense scrutiny worldwide. This story highlights the ethical and legal tensions at the intersection of innovation, creativity, and copyright law.

What was Project Panama, and how did it work

Internal documents from the copyright lawsuit, now unsealed by a federal judge, describe Project Panama as Anthropic’s internal codename for a sweeping effort to scan and digitize books on a massive scale. According to those filings, Anthropic executives believed that training their AI on professionally edited books — rather than solely on internet text — would improve its ability to write and understand human language.

To accomplish this, the company reportedly purchased millions of used books from sources like Better World Books and World of Books and then physically cut off their spines so every page could be digitized at high resolution. The original physical volumes were then recycled after scanning. This destructive process differed sharply from traditional library digitization projects that preserve physical books.

Anthropic spent tens of millions of dollars on the project over more than a year and even hired industry veterans with experience in large-scale book digitization. While the total number of books scanned remains partially redacted, some filings indicate ambitions to include up to tens of millions of works.

Why Anthropic Pursued Book Data

Anthropic’s leaders reportedly saw books as a “premium” data source because they contain carefully structured language, professional editing, and a wide variety of topics — in contrast to informal internet text that bots often ingest. One co-founder said training on books would teach the AI “how to write well,” moving beyond what they viewed as the “low-quality internet slang” common in other datasets.

Before resorting to physical scanning, the company also gathered digital books from unauthorized online “shadow libraries,” including LibGen and the Pirate Library Mirror. These are pirate sites where millions of books can be downloaded without permission from authors or publishers. Anthropic later settled a class-action copyright lawsuit over these practices for $1.5 billion, although the company denied that it used those pirated books to train commercial models.

These practices illustrate the lengths to which AI companies have gone — legally and ethically — to build massive text databases that power modern large language models. Competitors like Meta, Google, and OpenAI have faced similar legal challenges.

Legal Rulings and Ethical Controversy

U.S. courts have weighed in on whether AI training on copyrighted books constitutes fair use. In one landmark 2025 ruling, a judge said that using books in “transformative” ways — such as to help an AI learn to generate new text — can be fair under copyright law, even if the AI system internally stores elements of the original works. This ruling helped Anthropic defend its training methods.

Still, other aspects of how Anthropic obtained its data — especially downloading pirated copies — brought substantial legal risk. The company chose to settle rather than face further court battles. Authors whose books were involved may receive payouts as part of that settlement.

Critics argue the approach raises serious ethical questions about consent, compensation for creators, and the future of human expression in the age of AI. Authors’ advocacy groups and publishing associations have called for more responsible AI deployment and policies that protect creative labor.

Impact on AI Development and Creative Industries

The unredacted revelations are igniting debate across tech and publishing communities. For AI developers, this case highlights the need to balance innovation with ethical sourcing of training data. Many fear that reliance on scraped or destructively scanned material could undermine trust in AI technologies. For authors and publishers, the case underscores ongoing concerns that their work fuels powerful AI systems without adequate recognition or payment.

Regulators in multiple countries may revisit copyright law and AI training data standards in response, potentially introducing new requirements for transparency and fair compensation. Some legal analysts predict this story will influence future lawmaking, litigation strategy, and corporate policies on data collection.

At the same time, AI companies are under pressure to clarify how they source data and ensure their tools are developed ethically — priorities that could shape the next generation of AI products.

What Comes Next for AI and Copyright

The controversy surrounding Anthropic’s book scanning plan is more than a single corporate story — it reflects broader tensions in how society defines ownership, creativity, and the role of machine intelligence. As AI continues integrating into everyday life, policymakers, creators, and companies will need clearer standards for how AI systems are trained and whose work fuels them.

This debate isn’t just legal or technical; it touches on cultural values about preserving human expression and rewarding the creators who enrich global knowledge. For those watching AI’s evolution, the Anthropic revelations are a watershed moment — one that could shape both innovation and accountability in the years ahead.

Subscribe to trusted news sites like USnewsSphere.com for continuous updates.

Sharing articles