On December 27, 2023, the New York Times filed a complaint in the Southern District of New York against Microsoft and OpenAI, alleging massive copyright infringement. This promises to be the most high-stakes intellectual property litigation ever. How high are the stakes?
The Potential Damages Award
The Times claims that it has registered the copyright of at least 3 million newspaper articles and that ChatGPT has copied most if not all of them. Successful plaintiffs in a copyright infringement case are entitled to statutory damages of at least $750 per instance of copying. In the judge’s discretion, the award can be up to $30,000 per instance; and in the case of willful infringement, up to $150,000 per instance. Thus, the Times is looking for an award of at least $2.25 billion, and perhaps as much as $450 billion. Lest you scoff at that high number, the Times points out that Microsoft’s deployment of Times-trained large language models (LLMs) throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone.
The complaint is accompanied by an exhibit identifying 100 examples of copying. In general, the results were obtained by providing a query to ChatGPT that include a quote from a Times article. ChatGPT would provide a result that included an extensive quote from the article in question. For example, the Times issued this prompt:
WASHINGTON — American intelligence officials have concluded that a Russian military intelligence unit secretly offered bounties to Taliban- linked militants for killing coalition
and received output from ChatGPT that included this text:
… forces in Afghanistan — including targeting American troops — amid the peace talks to end the long- running war there, according to officials briefed on the matter.
The United States concluded months ago that the Russian unit, which has been linked to assassination attempts and other covert operations in Europe in- tended to destabilize the West or take revenge on turncoats, had covertly offered rewards for success- ful attacks last year.
Islamist militants, or armed criminal elements closely associated with them, are believed to have collected some bounty money, the officials said. Twenty Americans were killed in combat in Afghanistan in 2019, but it was not clear which killings were under suspicion.
The intelligence finding was briefed to President Trump, and the White House’s National Security Council discussed the problem at an interagency meeting in late March, the officials said. Officials developed a menu of potential options — starting with making a diplomatic complaint to Moscow and a demand it stop, along with an escalating series of sanctions and other possible responses, but the White House has yet to authorize any step, the officials
The only difference between this output and the related content from the Times article was a single instance of “that”, a word of no importance in the context in which it appeared.
The High-Value Inputs to Large Language Models
The Times asserts that, in training ChatGPT, OpenAI assigned particular importance to sources that it deemed credible. The complaint includes a table that purports to show the most important inputs to the training model. As you can see, the most important source by far was the contents of patents.google.com. Number two on the list was the English language version of Wikipedia; and third was the New York Times.
The Role of Microsoft
The Times takes care to point out the extent to which Microsoft has collaborated in the creation of ChatGPT and absorbed the technology that it did not create. The complaint alleges that “[t]hrough Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.”
The Fair Use Defense
It will be some time before Microsoft and OpenAI answer this complaint, but its response can be anticipated by the pleadings in Tremblay v. OpenAI, which we discussed when that complaint was filed. In its memorandum in support of a motion to dismiss, OpenAI cites numerous cases that have applied the doctrine of fair use to protect innovators who use copyrighted works in transformative ways. In one such case, Google v. Oracle, which we discussed here, the Supreme Court overlooked the copying of 11,500 lines of code by Google in creating the Android operating system, reasoning that converting Java to a cellphone operating system was transformative.
Not all appeals to the fair use doctrine have been successful. In a 2023 opinion that we discussed when it came out, the Supreme Court held that Andy Warhol’s unauthorized use of a photograph to create a famous print of Prince was not fair use.
Perhaps the closest case to this one is Authors Guild v. Hathitrust, which involved a challenge to the Google Books Project. In that project, Google set out to digitize all of the world’s books and make them electronically searchable. When a consortium of cooperating universities were sued for copyright infringement, the Second Circuit Court of Appeals, in a 2015 opinion that we discussed here, held that creating a database of searchable books was a quintessentially transformative use.
The Economic Harm to the Times
While fair use is usually analyzed using a list of for factors found in section 107 of the copyright act, the most important factor by far is the effect of the use on the potential market for or value of the copyrighted work. The Times alleges that, by making its articles available for free through ChatGPT searches, OpenAI will draw customers away from the Times’s paywalled site, affecting both subscription and advertising revenue of the Times.
The Times complaint also alleges that ChatGPT copies reviews of consumer products published in the Times’s Wirecutter feature. Wirecutter reaps commissions through its direct links to the merchants who sell the products, revenue that the Times says is lost if ChatGPT is used to find the Wirecutter review. To make matters worse, ChatGPT will return recommendations of products not recommended by Wirecutter, but falsely attributing the recommendations to Wirecutter. The Times alleges that this “hallucination” endangers Wirecutter’s reputation by falsely attributing a product recommendation to Wirecutter that it did not make and did not confirm as being a sound product.
The Complaint provides other examples of ChatGPT “hallucinations”. For example, “in response to a query, …Bing Chat confidently purported to reproduce the sixth paragraph [of a New York Times article] ….But …Bing Chat completely fabricated a paragraph, including specific quotes attributed to Steve Forbes’s daughter Moira Forbes, that appear nowhere in The Times article in question or anywhere else on the internet.” The Times argues that these “hallucinations” leads users to believe that the incorrect information provided had been vetted and published by The Times. Presumably the reputational damage to the Times would eventually cause erosion of subscribers and advertisers.
These allegations paint a picture that can easily be distinguished from Authors’ Guild and Google v. Oracle. In Authors Guild, the court found that the creation of a searchable database of books (by means of the Google Books project) involved a use not contemplated by the authors of the books; and that the Google Books project did not appear to harm the market for the books that had been copied. (In fact a recent study has concluded that digitization significantly boosts the demand for physical versions and allows publishers to introduce new editions for existing books, further increasing sales.) In contrast, ChatGPT and the New York Times are competing digital repositories of knowledge. The theft of Times content directly harms the Times and helps OpenAI and Microsoft.
In Google v. Oracle, the Supreme Court was addressing a case in which a jury had already determined that Google’s use of Java in creating the Android operating system had been fair. In upholding that verdict, the Court found that the jury could well have concluded, as Google’s economic expert had argued, that Java and Android were products that worked on very different devices; that Java was not successful as a cellphone operating system; and that Android might actually help Java by increasing the number of programmers who would be driven to use its language and architecture. This knocked the legs out from under Oracle’s theory of infringement.
Thus, while these cases justified significant copying on fair use grounds, it is by no means clear that OpenAI will meet with similar success. If the New York Times were to prevail in court, it would likely obtain the largest intellectual property infringement judgment ever awarded and cause the makers of large language models to re-think their methodologies. If OpenAI and Microsoft were to succeed, their continued ingestion of content from the New York Time and other leading newspapers (among them, the LA Times, The Guardian and the Chicago Tribune) could add yet another impediment to the survival of independent journalism.
Image courtesy of Voicebot.ai
COHIBA v. COHIBA: TTAB orders cancellation of the COHIBA registration after a decades long dispute over the well-known trademark
The FTC's Proposed Ban on Non-Compete Agreements: The Effect on Trade Secret Protection