The rise of artificial intelligence tools that harness vast amounts of Internet content has begun to test the limits of copyright law.
Authors and a major photography agency filed a lawsuit over the past year, alleging that their intellectual property was illegally used to train artificial intelligence systems, which can produce human-like prose and power applications like chatbots.
Now the news industry has joined them. The New York Times filed a lawsuit on Wednesday accusing OpenAI and Microsoft of copyright infringement, the first such challenge by a major American news organization over the use of artificial intelligence.
The lawsuit claims that OpenAI’s ChatGPT and Microsoft’s Bing Chat can produce content nearly identical to Times articles, allowing the companies to “leverage the Times’ enormous investment in its journalism by using it to create substitute products without permission or payment.”
OpenAI and Microsoft have not had a chance to respond in court. But after the lawsuit was filed, those companies said they were in talks with several news organizations about using their content and, in the case of OpenAI, had begun signing deals.
Without such agreements, boundaries can be resolved in court, with significant repercussions. Data is crucial to the development of generative AI technologies (which can generate text, images and other media on their own) and to the business models of the companies that do that work.
“Copyright will be one of the key points shaping the generative AI industry,” said Fred Havemeyer, an analyst at financial research firm Macquarie.
A central consideration is the “fair use” doctrine in copyright law, which allows creators to take advantage of copyrighted work. Among other factors, defendants in copyright cases must demonstrate that they substantially transformed the content and are not competing in the same market as a substitute for the original creator’s work.
A review that quotes passages from a book, for example, could be considered fair use because it relies on that content to create a new and unique work. On the other hand, selling long excerpts from the book may violate the doctrine.
Courts have not weighed in on how those standards apply to artificial intelligence tools.
“There’s no clear answer as to whether in the United States that’s copyright infringement or fair use,” said Ryan Abbott, a lawyer at Brown Neri Smith & Khan who handles intellectual property cases. “In the meantime, we have many ongoing lawsuits with potentially billions of dollars at stake.”
It could be some time before the industry gets definitive answers.
Lawsuits raising these issues are in the early stages of litigation. If no settlement is reached (as is the case with most litigation), it could take years for a Federal District Court to rule on the matter. Those rulings would likely be appealed, and appeal decisions could vary by circuit, potentially elevating the issue to the U.S. Supreme Court.
Getting there could take about a decade, Abbott said. “A decade is an eternity in the market we live in today,” he said.
The Times said in its lawsuit that it had been in talks with Microsoft and OpenAI about terms to resolve the dispute, possibly including a license. Associated Press and Axel Springer, the German owner of outlets like Politico and Business Insider, recently arrived data license agreements with OpenAI.
Taking the cases to trial could answer vital questions about what copyrighted data AI developers can use and how. But it could also simply serve as leverage for a plaintiff to obtain a more favorable licensing agreement through a settlement.
“Ultimately, whether or not this lawsuit ends up shaping copyright law will depend on whether the lawsuit is really about the future of fair use and copyright, or whether it is a salvo in a negotiation,” he said. Jane Ginsburg, professor at Columbia Law School. she said of the lawsuit per The Times.
How the legal landscape develops could shape the nascent but highly capitalized AI industry.
Some AI companies were flooded with venture capital last year after the public launch of ChatGPT went viral. A stock plan under consideration could value OpenAI at more than $80 billion; Microsoft has invested $13 billion in the company and has incorporated its technology into its own products. But questions about using intellectual property to train models have been a major concern for investors, Havemeyer said.
Competition in the field of AI can come down to data haves and have-nots.
Companies with rights to large amounts of data, such as Adobe and Bloomberg, or that have accumulated their own data, such as Meta and Google, have begun to develop their own artificial intelligence tools. Havemeyer noted that an established company like Microsoft was well equipped to secure data licensing agreements and address legal challenges. But startups with less capital may have a harder time getting the data they need to compete.
“Generative AI begins and ends with data,” Havemeyer said.
Benjamin Mullin contributed reports.