OpenAI and an array of OpenAI-affiliated entities (collectively, “OpenAI”) have responded to the copyright-centric lawsuit waged against them by the New York Times, asking the court to toss out a number of the news publisher’s “legally infirm” claims and “focus the litigation on the core issues that really matter.” And in what is sure to be a headline-grabbing angle of OpenAI’s motion to dismiss, the generative AI giant asserts at the outset that despite the Times’ arguments to the contrary, its ChatGPT platform “is not in any way a substitute for a subscription to the New York Times,” and in reality, the Times had to “[pay] someone to hack OpenAI’s products” – and dedicate “tens of thousands of attempts” – in order to produce the allegedly infringing outputs that create the basis for its complaint.
Some Background: The New York Times filed suit against OpenAI and partner Microsoft in a New York federal court in December. According to the complaint, the defendants are on the hook for making “unlawful use of the Times’s work to create artificial intelligence products that compete with it [and that] threatens the Times’s ability to provide [trustworthy information, news analysis, and commentary].” The defendants’ generative AI tools “rely on large-language models that were built by copying and using millions of the Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more,” the paper asserts, setting out claims of copyright infringement, violations of the Digital Millennium Copyright Act, unfair competition by misappropriation, and trademark dilution.
Setting the stage in its February 26 motion to dismiss, OpenAI urges the court to dismiss a handful of the New York Times’ claims, which “are not viable, even as pleaded.” In a nutshell, OpenAI states that the claims at issue and the problems with them are as follows …
(1) By way of a direct copyright infringement claim, OpenAI argues that the New York Times asserts liability that is based “in part” on conduct that is barred by the Copyright Act’s three-year statute of limitations, and thus, such should be dismissed. The conduct at issue, which occurred “more than three years ago,” according to OpenAI, primarily consists of the construction of the “WebText” database and OpenAI’s use of that dataset to train GPT-2; construction of an “expanded version of the WebText dataset” called “WebText2″; and its use of WebText2 and Common Crawl to train GPT-3.
(2) The Times falls short on its contributory infringement claim – which aims to hold OpenAI liable for “materially contribut[ing] to and directly assist[ing] with the direct infringement perpetrated by end-users of the GPT-based products” – as it has not alleged that OpenAI had actual knowledge of specific infringements. Here, OpenAI asserts that “the only allegation supporting the Times’s contributory claim states that OpenAI ‘had reason to know of the direct infringement by end-users’ because of its role in ‘developing, testing, and troubleshooting’ its products.” But such “generalized knowledge” of “the possibility of infringement” is not enough, it argues.
(3) The Times’ claim for violations of the Digital Millennium Copyright Act (“DMCA”) (17 U.S.C. § 1202), which prohibits the “[r]emoval or [a]lteration” of copyright management information (“CMI”), fails on two different fronts.
> S. 1202 claim based on the removal of CMI in the training of OpenAI’s models: The first s. 1202 violation alleged in the complaint asserts that OpenAI “removed” CMI “in building the training datasets” in violation of Section 1202(b)(1) of the DMCA. This claim fails, per OpenAI, as the New York Times does not plausibly allege that any CMI was removed, and in fact, the publisher actually “concedes that some CMI was preserved.” And more than that, none of the Times’ specific allegations “actually suggest OpenAI designed its alleged ‘scrap[ing]’ process to omit CMI,” its complaint “lacks allegations about the inclusion (or exclusion) of the Times’s CMI in any ‘third-party datasets’ … much less about OpenAI scrubbing any CMI from those datasets,” and there is “no allegation in the complaint supporting the conclusion that [OpenAI’s] ‘training process’ excludes CMI ‘[b]y design.’”
> S. 1202 claim based on the lack of CMI in the OpenAI output: The second category of s. 1202 violation in the complaint alleges that OpenAI violated Section 1202(b)(1)’s removal prohibition by failing to include the Times’s CMI in model outputs, and by displaying those outputs via ChatGPT “knowing that [CMI] has been removed.” The Times fails here, as well, according to OpenAI, since the allegedly infringing outputs “are not wholesale copies of entire Times articles.” Instead, “They are, at best, reproductions of excerpts of those articles, some of which are little more than collections of scattered sentences.”
Even setting the foregoing aside, OpenAI argues that the New York Times fails to allege a CMI-based injury. Here, the Times alleges that any harm “relates entirely to its inability to receive speculative licensing revenue, and the possibility that ChatGPT will ‘divert readers’” – neither of which has “any nexus to CMI.”
(4) The Times’ claim for unfair competition by misappropriation (under New York State law) is preempted by the federal Copyright Act, and thus, should be dismissed.
The “Genuinely Important” Issue
Against that background, OpenAI seems to leave a few of the Times’ claims in place: part of its claim for direct copyright infringement (namely, for any conduct that occurred within the three-year statute of limitations), its various copyright infringement claim, and its trademark dilution claim. In keeping these causes of action in play, ChatGPT-developer maintains that “there is a genuinely important issue at the heart of this lawsuit – critical not just to OpenAI, but also to countless start-ups and other companies innovating in this space – that is being litigated both here and in over a dozen other cases around the country (including in this Court).”
The “genuinely important” issue: “Whether it is fair use under copyright law to use publicly accessible content to train generative AI models to learn about language, grammar, and syntax, and to understand the facts that constitute humans’ collective knowledge.” (Note: You may recall that OpenAI previously shed light on its position on fair use in response to the copyright case waged against it by a number of authors, including comedian Sarah Silverman.)
OpenAI states that it and other defendants in a growing number of similar lawsuits “will ultimately prevail because no one – not even the New York Times—gets to monopolize facts or the rules of language.” And for “good reason,” it says, noting that that “there is a long history of precedent holding that it is perfectly lawful to use copyrighted content as part of a technological process that (as here) results in the creation of new, different, and innovative products.” Continuing on the defendant asserts that “it has long been clear that the non-consumptive use of copyrighted material (like large language model training) is protected by fair use.”
THE BIGGER PICTURE: Reflecting on the importance of the case, Andres Guadamuz, a reader in intellectual property law at the University of Sussex, states that it “cannot be seen in isolation, [as] it is not only the strongest case so far against Generative AI companies, including the Getty lawsuits, but it should also be considered in the context of the ongoing battle between traditional media and technology, a struggle that has persisted for two decades.” He predicts that this case – may bring to the forefront a legal argument already evident in the Getty Images suit in England: the non-infringing uses theory, “could prompt many other media companies to initiate proceedings against tech companies” for several years to come.
The case is New York Times Company v. Microsoft Corporation, et al., 1:23-cv-11195 (SDNY).