UK Publishers Association, Cambridge, Call Out Meta and Piracy in Generative AI Training

Statements from the UK Publishers Association and Cambridge University Press condemn Meta’s use of copyrighted content to train AI.

Image: Getty – Igor Kutyaev

By Porter Anderson, Editor-in-Chief | @Porter_Anderson

Cambridge: ‘Meta Should Pay for the Content It has Stolen’

Following our coverage of London Book Fair‘s Main Stage discussion with Maria A. Pallante, president and CEO of the Association of American Publishers (AAP) and Dan Conway, CEO of the United Kingdom’s Publishers Association, a new understanding is crystalizing in terms of the world publishing industry, artificial intelligence (especially generative AI) and copyright policy and protection.

You’ll recall that by March 17, these two leaders were gearing up for the quick closure of the United States’ Trump administration’s call for public comment on its “AI Action Plan,”

Today (March 25), the Publishers Association in the UK has pointed to Alex Reisner’s article at the Stateside Atlantic magazine, in which Reisner writes, “When employees at Meta started developing their flagship AI model, Llama 3, they faced a simple ethical question. The program would need to be trained on a huge amount of high-quality writing to be competitive with products such as ChatGPT, and acquiring all of that text legally could take time. Should they just pirate it instead?”

And this, of course, dovetails perfectly with Pallante’s closing comments in the AAP response to the call for comment in the states, in which she wrote, “Among our priorities is stopping the proliferation of pirate sites that are a scourge on American IP investments and an illegal source of AI development.” More than once, she lands clearly on the point of piracy being involved as a source of “training” content in generative AI that’s copyrighted, and pirated, then hoovered up by large language models.

Reisner’s article looks at LibraryGenesis or LibGen, a pirated library that Meta’s employees, getting Mark Zuckerberg’s permission, the article asserts, accessed for some of its unpaid use of copyrighted content. When some internal Meta communications were revealed in a copyright-infringement lawsuit, this illicit use of copyrighted content was exposed not only for Meta’s API, but only by OpenAI and its ChatGPT.

And this has prompted the following response today in London from Catriona MacLeod Stevenson, the PA’s general counsel and deputy CEO:

Catriona MacLeod Stevenson, General Counsel and Deputy CEO of the Publishers Association, said:

“While we have long suspected that illegal pirate websites have been used in the training of LLMs, court documents reported by The Atlantic show that Meta employees were actively encouraged to download and use LibGen’s more than 7.5 million books and 81 million research papers to use to train its LLMs.

Catriona MacLeod Stevenson

“This is infringement of authors’ and publishers’ copyright on a massive scale, and should not go unchallenged. The Publishers Association and its members are actively considering their next steps in this regard.

“Publishers – and other creative sectors – have said many times before, big tech companies can afford to pay for the content they use and should do so. There is a simple way to access the high-quality content AI developers wish to use to train LLMs, and that is paying for it, just as they pay for the electricity they use, in the ordinary course of doing business.

“As the UK Government considers the thousands of responses to its Copyright and AI Consultation, now is the time to make it clear that companies such as Meta need to be transparent about the copyright-protected works they have used and wish to use, and enter in good faith into licensing discussions so that rightsholders can be remunerated for their work.”

Cambridge Speaks Up

In addition, Cambridge University Press & Assessment has today provided Publishing Perspectives with a statement on the same issue.

Identified as coming from “a Cambridge spokesperson,” the statement reads:

“It is dismaying to learn that Meta turned to piracy to harvest content for its AI development, including books and journals from Cambridge authors.

“Meta should pay for the content it has stolen. It’s essential that governments and authorities do not let big tech companies get away with taking authors’ work without permission. This reinforces the risks of inadequate regulation and legislation around AI and copyright, such as the ‘opt-out’ system proposed in the UK.

“Effective licensing approaches are in place, and tech companies can reach licensing agreements with publishers and rightsholders for use in LLMs. There are legal ways to access and use copyright content. Piracy is not one of them. We and our partners are working to frustrate those who seek to steal copyrighted material, and we will take all appropriate action, including with industry bodies.”

We will of course continue to monitor events and news relative to AI and the international publishing industry.

A Programming Note

As international publishing professionals get ready for next week’s Bologna Children’s Book Fair (March 31 to April 3) , it’s good to remember that Jacks Thomas‘ Bologna Book Plus is staging a full-morning “AI Summit” on April 1 from 9:30 a.m. to 11:55 a.m. with seven sessions on the issue.

Full details of the program’s speakers and agenda are here.

More from Publishing Perspectives on artificial intelligence and world publishing is here, more on copyright is here, more on the freedom to publish is here, more on the Association of American Publishers is here, more on the UK’s Publishers Association is here, more on London Book Fair is here, and more on the International Publishers Association’s work is here.

Publishing Perspectives is the International Publishers Association’s world media partner.

About the Author

Porter Anderson

Porter Anderson has been named International Trade Press Journalist of the Year in London Book Fair's International Excellence Awards. He is Editor-in-Chief of Publishing Perspectives. He formerly was Associate Editor for The FutureBook at London's The Bookseller. Anderson was for more than a decade a senior producer and anchor with CNN.com, CNN International, and CNN USA. As an arts critic (Fellow, National Critics Institute), he was with The Village Voice, the Dallas Times Herald, and the Tampa Tribune, now the Tampa Bay Times. He co-founded The Hot Sheet, a newsletter for authors, which now is owned and operated by Jane Friedman.