Sunstein Insights

Back to All Publications

The Overlooked Claim of The New York Times v. OpenAI: Harm to Copyright Management Information

June 24, 2024

Kelly Kutsor | Intern View more articles

The Digital Millennium Copyright Act (DMCA) was passed by Congress in 1988 to provide solutions for the strained relationship between the internet and copyright law. It focused on protecting copyright owners whose works were made available in digital form. Now, in the age of emerging AI technologies, the DMCA, specifically §1202, may be more important than ever.

What Is A DMCA §1202 Claim?

A DMCA §1202 violation occurs when a person harms the integrity of copyright management information (CMI). CMI is defined in §1202(c) to encompass a work’s title, the identity of the author, and terms and conditions for use of the work. Harm to CMI can occur through the production and distribution of false CMI or the removal or alteration of CMI and the distribution of such altered works. §1202(b) covers three different causes of action:

the intentional removal or alteration of any CMI,
the knowing distribution of CMI that has been removed or altered without authority of the copyright owner; and
the knowing distribution or public performance of works with CMI that has been removed or altered without authority of the copyright owner or the law.

In each cause of action, it is a further requirement that the defendant knew, or had reasonable grounds to know, that it the removal or alteration of the CMI would induce, enable, or conceal copyright infringement.

The New York Times’s §1202(b) Claim Against Microsoft And OpenAI

As previously discussed here, the New York Times in December 2023 filed a complaint against Microsoft and OpenAI in the Southern District of New York. The complaint alleged extensive copyright infringement on the part of Microsoft and OpenAI. It also alleged that Microsoft and OpenAI removed the Times’s CMI from the Times articles that were scraped in the training of the OpenAI and Microsoft large language models (LLMs).

The Times alleged that its infringed works all included CMI, such as, “copyright notice, title and other identifying information, terms and conditions of use, and identifying numbers or symbols referring to the [CMI].” It alleged that Microsoft and OpenAI used millions of Times articles to train their LLMs and that during this training they removed the CMI from the articles. The Times claimed that when Microsoft and OpenAI’s LLMs created output for users, the LLMs created “copies or derivatives of Times’s works” that were missing the CMI. The complaint took care to track the precise wording of 1202(b) in its allegations, including an allegation that the removal of the CMI was done knowingly and with the intent to induce and conceal infringement of the Times’s copyrights.

The Times claimed that, by providing its articles as output without their associated CMI, the defendants distributed them while concealing that the output of their LLMs were infringing copyrighted works. Lastly, the Times claimed a non-specific injury due to the alleged removal of the CMI and claimed a right to the statutory damages afforded by the DCMA.

The Defendants’ Motion to Dismiss the §1202(b) Claims

In February 2024, Microsoft and OpenAI filed a motion to dismiss some of The Times’s claims, including all alleged §1202(b) violations. In the memorandum supporting the motion to dismiss, Microsoft and OpenAI state that §1202(b) claims cannot succeed when the removal of CMI is an unintended result of an “automatic process," citing a non-precedential order in Zuma Press, Inc. v. Getty Images (US), Inc.

Microsoft and OpenAI also argued that the §1202(b) claim should be dismissed because the Times failed to specify what exact CMI was included in each specific work alleged to have been infringed and that the information, if available, was not actually “conveyed in connection with” the works, but rather was hidden in small text at the bottom of the page. Microsoft and OpenAI cited a recent case in the Northern District of California, Andersen v. Stability Ltd. in which the plaintiff’s §1202 claims were dismissed because of the failure to plead “the exact type of CMI included in each work.”

After this, Microsoft and OpenAI said that the §1202 should be time barred because the training took place more than three years ago, outside of the three-year copyright statute of limitations. This statute of limitations argument raises the question of whether the three-year period in question begins when the unlawful act occurs or whether it begins when the copyright owner discovers, or in the exercise of diligence, should have discovered, the act. As is discussed in another Insights article, this issue was present in a very recent Supreme Court case, but the Court did not address it. This leaving untouched (for now) precedent binding on the Southern District of New York that the three-year limitations period is tolled while the infringement is concealed from the copyright owner.

In a similar motion to dismiss from Daily News LP v. Microsoft Corporation, Microsoft and OpenAI also argued that the purpose of protecting CMI is for the public to be able to source works, not to “govern purely internal databases”, drawing the conclusion that there would be no enabling of infringement within an internal database.

The defendants next moved on to dismantling the §1202(b)(2) and (3) claims by arguing that the LLMs do not “distribute” infringing works. For support, they quote FurnitureDealer.Net, Inc v. Amazon.com, Inc as stating that “distribution” requires a “sale or transfer of ownership extending beyond that of a mere public display.” The defendants argue that the Times only alleged the creation of “mere public displays” rather than a sale or transfer of ownership of Times articles.

Microsoft and OpenAI also claim that the outputs of their LLMs reproduce only excerpts of the Times articles, “some of which are little more than collections of scattered sentences.” They cite Fischer v. Forrest for the proposition that §1202(b)(1) and (3) only apply when the works in question are “substantially or entirely reproduced.” Lastly, Microsoft and OpenAI point out that the Times’ examples of infringement in an exhibit to its complaint contain text from the middle of articles, text that contains no CMI that can be said to have been removed.

The defendants state that the Times does not have standing to allege a §1202(b) violation because it failed to show any injury caused by the alleged violations. The argued that the “Harm to the Times” section of the Times’s complaint, which expresses a fear of lost revenue and reader diversion, has nothing to do with the removal of CMI.

Microsoft and OpenAI also note that much of the allegedly infringing output was created by feeding the LLMs part of The Times’s articles. Output created by such prompts to LLMs suggests that the user already has access to the Times’ articles and thus knows the source to be the Times. The defendants argue that, under such circumstances, there can be no imaginable harm for The Times to explore.

And Now… We Wait

Ultimately, we will all have to wait and see what happens next. There are other similar cases floating around, such as Doe v. Github, Inc., that have had various forms of success with §1202 claims in this very early stage of litigation. It is important to keep in mind that in that in order to survive a motion to dismiss, “a complaint must contain sufficient factual matter, accepted as true, to state a claim to relief that is plausible on its face.” Even if these §1202 claims do not get dismissed fully or in part, they will still have to survive trial in order to potentially set precedent.

If the Times or other similarly situated plaintiffs succeed, this could be a game changer for copyright holders everywhere, especially for copyright owners who have not registered their works with the Copyright Office. While copyright registration is needed to sue for infringement, it is not necessary for a DMCA §1202 claim. Thus, copyright holders everywhere should be including CMI on all of their works, especially those posted or shared online. Not only is this good practice for getting credit for one’s work, but it might just give them a claim if their work is scraped and regurgitated by an LLM or AI Image Generator.