
Image – Getty: EbruBlue
By Porter Anderson, Editor-in-Chief | @Porter_Anderson
Pallante: ‘Meta’s Systematic Copying of Protected Works’
A major artificial intelligence class-action lawsuit against Meta now includes a significant amicus brief in support of the plaintiffs.
The Association of American Publishers‘ (AAP) amicus brief support named plaintiffs including authors Michael Chabon, Ta-Nehisi Coates, Junot Diaz, Sara Silverman, Richard Kadrey, and Christopher Golden, on behalf of themselves and others.
The lawsuit, filed December 22, 2023, in the Northern District of California, asserts that Meta appropriated millions of human-authored works—including books by the lead plaintiffs—for the purpose of training and accelerating its consumer-facing generative AI model called “LLaMA.” In other words, this is another case of “training” by a large language model, allegedly on copyrighted content without permission or payment.
The suit further asserts that Meta acquired much of the infringing material from notorious pirate sites on the watch-lists of the United States government, such as “Anna’s Archives” and “LibGen.”
Maria A. Pallante, president and CEO of the Association of American Publishers, is also the former 12th United States Register of Copyrights. In explicating the AAP’s brief, she writes:
Maria A. Pallante
“In filing this amicus brief, AAP explains in detail that Meta’s systematic copying and encoding of protected creative works, word by word, into a large language model, is not a transformative fair use under the law, but rather, grossly exceeds the doctrine’s legal purpose and judicial precedent.
“The brief also corrects Meta’s spurious assertion that there is no way for AI developers to lawfully license what they seek to use, citing numerous examples to the contrary of existing and emerging markets.”
Pallante and the AAP note that Meta intended “to capture the author’s protected expression and not merely statistical information.”
AAP’s brief warns that Meta’s actions undermine the copyright incentives that Congress enacted and eviscerate the ability of authors and publishers to realize their significant creative and financial investments in books and other intellectual property.
It was in January that Kate Knibbs wrote at Wired, “Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against the company’s wishes, a court unredacted information alleging that Meta used Library Genesis (‘LibGen’), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models.”
It’s worth noting that—as Kelvin Chan has written for the Associated Press—French publishers and authors said on March 12 that they, too, were “taking Meta to court, accusing the social media company of using their works without permission to train its artificial intelligence model. Three trade groups said they were launching legal action against Meta in a Paris court over what they said was the company’s ‘massive use of copyrighted works without authorization’ to train its generative AI model.”
And on March 25, Blake Brittain reported for Reuters in the State that “Meta Platforms has asked a U.S. court to rule that it did not violate copyright law when it used books by writer Ta-Nehisi Coates, comedian Sarah Silverman, and others to train its artificial intelligence system. Meta told a federal judge in San Francisco that it made ‘fair use’ of the books in developing its large language model Llama, arguing that the authors’ lawsuit should be thrown out.”
This is where the AAP aims its sternest material in its amicus brief, the assertion by Meta that its unauthorized and unremunerated use of copyrighted content comes under the protection of “fair use.”
Key Highlights From the AAP Amicus Brief
The Association of American Publishers makes the following points about its amicus brief:
“The long-term potential of AI technology will only be realized by preserving the marketable rights that enable authors, publishers, and AI developers to engage in mutually beneficial commercial transactions.”AAP amicus brief
- “The defendant, Meta Platforms, Inc. (‘Meta’), ‘is a company valued at more than US$1 trillion [and] asks this court to declare that it is free to appropriate and commercially exploit the content of copyrighted works on a massive scale without permission or payment for that content—a ruling that would have catastrophic consequences for authors and publishers of books, journals, and other textual works protected by copyright.’
- “Meta claims, ‘There is no evidence that a market for licensing books to train LLMs’ exists, and there is ‘no economically feasible mechanism for Meta or other LLM developers to obtain licensed copies.’ Meta’s claims are patently false.
- “The existence of an active market for AI training materials is indisputable.
- “Since AI emerged in public life with the launch of ChatGPT at the end of 2022, AI companies including OpenAI (the company behind ChatGPT), Microsoft, Amazon, and others have entered into content licensing deals with publishers in order to access and use their works to build and operate AI systems.”
- “Some researchers estimate the AI training license market to be valued at US$2.5 billion now, and projected to grow to US$30 billion within a decade. Licensing structures continue to evolve that enable authors and content owners to participate in collective deals, receive attribution when AI tools rely upon their work, and be compensated for their contributions.
- “Significantly, despite entering into discussions with book publishers to acquire authorized copies of their works to train Llama, Meta instead chose to acquire texts from notorious pirate sites like ‘LibGen’ and ‘Anna’s Archive.’ In light of this history, it’s perhaps unsurprising that Meta seeks to deny the very existence of a viable market for AI training materials.
- “Common sense dictates that authors’ words themselves, not just ‘statistical information’ about them, are stored in the model. Otherwise how could the model capture ‘word order’ or ‘syntax?’ And how would Llama generate word-based output?
- “In addition to avoiding the inconvenience and expense of licensing and compensating copyright owners for the commercial use of their content, Meta opted to evade
technological protections that are essential to a functioning online marketplace for copyrighted works. This is manifestly at odds with the mandate of Congress in adopting the DMCA [Digital Millennium Copyright Act]. A finding of fair use in this case would not only undermine the public interest in a workable copyright regime, but encourage and reward theft twice over. - “Just as the long-term public interest is served by protecting the exclusive rights of copyright owners, the long-term potential of AI technology will only be realized by
preserving the marketable rights that enable authors, publishers, and AI developers to engage in mutually beneficial commercial transactions.”
The association goes on to list AI licensing deals for text content, either publicly announced or reported by AAA member-publishing companies. “Undoubtedly there are many more that are not known to AAP or are still in the pipeline.”

Image and data: Association of American Publishers
A copy of the AAP amicus brief can be found here (PDF).
More from Publishing Perspectives on copyright is here, more on issues in artificial intelligence is here, and more on the work of the Association of American Publishers is here.


