Navigating the Digital Frontier: Exploring Copyright Challenges in the Age of ChatGPT — Cornell Undergraduate Law & Society Review

By: Sam Jacobson
Volume IX – Issue I – Fall 2023

Introduction

This article will first review and analyze relevant information regarding copyright law, defining what exactly a copyright entails and discussing policy justifications behind the copyright system in the United States. Then, this article will discuss the events of two copyright cases, currently in front of the U.S. District Court for the Northern District of California, challenging the language models of OpenAI, the company behind ChatGPT. As these cases are currently in front of the District Court, this article will analyze some of the counts in the two cases, determining the validity of the counts in question. The article concludes by considering future implications based on what determination the District Court comes to. I will argue that, in the cases of Silverman v. OpenAI and Tremblay v. OpenAI, the District Court could find that direct copyright infringement by OpenAI is voided by the fair use exemption, that there is not a significant degree of vicarious infringement by OpenAI, and that OpenAI did violate the Digital Millennium Copyright Act (DMCA) on the grounds of distributing the works in question with copyright management information (CMI) removed without the Plaintiffs’ permission.

I. Background on Copyright Law

i. Defining Copyright

A copyright is one of three broad categories of intellectual property, used by authors of creative works to protect their works from infringement. More specifically, copyright protects “original works of authorship, fixed in any tangible medium of expression now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device” [1]. To clarify this statute, as outlined in 17 U.S. Code §102, such works of authorships eligible for copyright protection extend to:

a) Literary works

b) Musical works

c) Dramatic works

d) Pantomimes and other choreographic works

e) Pictorial, graphic, and sculptural works

f) Sound recordings; and

g) Sculptural works

Additionally, eligible works for copyright must be fixed in a “tangible medium of expression”, essentially stating that such works must be recorded in some manner. For example, for a play to be copyrighted, there needs to be some established medium of the play, whether it is a written script for the play or a recorded video of the play. Ultimately, copyright for a work exists “the moment a work is created” [2], as long as such work is within one of the aforementioned categories of works eligible for copyright, and that the work is recorded in a tangible medium of expression.

ii. Purpose of Copyright

As for the purpose of copyright, Article I, Section 8, Clause 8 of the US Constitution, the Intellectual Property Clause explains that copyright is meant to “promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective works and discoveries” [3]. For a duration of the author’s life plus 70 additional years [4] authors of creative works are granted exclusive rights for their work. 17 U.S. Code §106 lists such exclusive rights [5], which include:

a) To reproduce the copyrighted work in copies or phonorecords

b) To prepare derivative works based upon the copyrighted work (e.g., sequels to a movie)

c) To distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending

d) To perform the copyrighted work publicly (such as a play or motion picture)

e) To display the copyrighted work publicly (such as sculptures)

f) To perform the copyrighted work publicly by means of digital audio transmission (playing a sound recording, such as a song)

iii. Policy Justification of Copyright

Through these exclusive rights, from a policy perspective, Congress wants to promote scientific innovation and development of new creative works, as such works might be desired or be beneficial for society. There is a concern that, if an author invested a great deal of time, effort, and money into a creative work, and then someone else simply duplicated/copied that work and sold it for a cheaper price, then it would undermine all of the authors' investments such that the author might not be incentivized to make the work in the first place, rather to wait for someone else to put in the effort. This concept is known as the Free Rider Problem, which is an economic-based fear of the possibility of people being able to “free-ride”, or obtain a benefit from someone else’s investment in creating a work, which would undermine the intellectual property system and could disincentivize people to not make creative works or incentivize property owners to invest insufficient resources into their works since there would be the possibility of the property owner being unable to recoup their investments [6]

With this background in consideration, this article will analyze two ongoing suits against OpenAI Inc., the company behind ChatGPT, an artificial intelligence (AI) model that, in its training, relies on the input of data to grow and develop. Here, the plaintiffs allege copyright infringement on the part of OpenAI as in its training of ChatGPT, OpenAI used authored works owned by the plaintiffs without their permission. In one of the suits, the plaintiffs allege additional violation of the Digital Millennium Copyright Act (DMCA), which will be discussed now in relation to the cases in question.

II. The Digital Millennium Copyright Act (1998)

In 1988, the Digital Millennium Copyright Act (DMCA) was passed by Congress “to address important relationships between copyright and the Internet”, ultimately amending US copyright law [7]. With the DMCA came three major updates to US copyright law:

(1) Established protections for online service providers in certain situations if their users engage in copyright infringement, including by creating a notice-and-takedown system, allowing copyright owners to inform online service providers about infringing material so it can be taken down

(2) Encouraged copyright owners to give greater access to their works in digital formats by providing them with legal protections against unauthorized access to their works (for example, hacking passwords or circumventing encryption)

(3) Made it unlawful to provide false copyright management information (for example, names of authors and copyright owners, titles of works) or to remove or alter that type of information in certain circumstances.

These primary components of the DMCA will be taken into account for the relevant suits.

III. Two Pending Class Action Suits: Tremblay v. OpenAI and Silverman v. OpenAI

OpenAI [Defendants] is currently facing two class action lawsuits on the grounds of alleged copyright infringement. Plaintiffs of both Tremblay and Silverman are writers of books (and each claim to have registered copyrights in the books they authored) that allege that their books were used to train the language model of ChatGPT [8] They allege that the “large language models” of ChatGPT “train” by studying a large amount of “training data,” from which the models derive abstract “patterns and connections”, and then repurpose these “patterns and connections” to interpret user prompts and generate “convincingly naturalistic text outputs” [9]. Plaintiffs brought these lawsuits because they suspect that OpenAI used the “patterns and connections” in their books to teach its models how to “converse” with users [10]. The basis for that suspicion is that, when Plaintiffs prompted ChatGPT to “summarize [their books] in detail,” ChatGPT was able to generate more-or-less “accurate” summaries containing expressed information in Plaintiffs’ copyrighted works [11]. For these suits, Plaintiffs assert six causes of action [12]:

(1) Direct copyright infringement (Count I)

(2) Vicarious infringement (Count II)

(3) Violation of Section 1202(b) of the Digital Millennium Copyright Act (Count III)

(4) Unfair competition under California Business & Professions Code Section 17200 (Count IV)

(5) Negligence (Count V) (6) Unjust enrichment (Count VI)

To narrow the scope of focus for this article, the validity of Counts I through III will be analyzed here for these suits, determining whether they favor the Plaintiffs or the Defendants and how that might affect the overall decision of the class action suits.

IV. Analysis

i. Count 1: Direct Copyright Infringement

At first glance, Defendants are arguably engaging in copyright infringement: they have taken these authors’ works, and have reproduced copies of them whenever ChatGPT is asked to describe or summarize these works. Here, the Plaintiffs have demonstrated that they have valid copyrights and Defendants are infringing on their copyrights with the language models of ChatGPT. Now, the burden of proof shifts to Defendants to either a) prove that the copyrights should not be valid or b) that they did not infringe upon the copyrights. Given the circumstances of this case, they would likely use an affirmative defense to alleged copyright infringement: fair use. Fair use of a copyrighted work, including reproduction of a work, is allowed for purposes including “criticism, commentary, news reporting, teaching (including multiple copies for classroom use), scholarship or research” [13]. As an affirmative defense, Defendants are arguing that, while they did engage in copyright infringement, that such infringement fell into an acceptable category of fair use such that they did not unlawfully infringe upon Plaintiffs’ works. In a court of law, there are four factors considered as part of a test of whether or not actions by an infringing party constitutes fair use:

(1) The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) The nature of the copyrighted work;

(3) The amount and substantiality of the portion used in relation to the copyrighted work as a whole;

(4) The effect of the use upon the potential market for or value of the copyrighted work.

The fair use four-part test can be seen through Campbell v. Acuff-Rose (1994), in which Acuff-Rose Music, who owns the rights to singer Ray Orbison’s “Pretty Woman” sued for copyright infringement in response to a periodized version of “Pretty Woman” by band 2 Live Crew (with Luther Campbell among the band members) [14]. 2 Live Crew had asked Acuff-Rose for permission to release the parody and promised to pay for rights and licensing fees, but Acuff-Rose denied Campbell’s offer, and, undeterred, 2 Live Crew released the song anyways without permission. Here, as 2 Live Crew’s parody had copied both the famous guitar riff from “Pretty Woman” and some of the lyrics, they were clearly infringing on the copyright owned by Acuff-Rose, so 2 Live Crew argued an affirmative fair use defense for their parody. Here, the Court applied the four-part fair use test:

(1) The nature/purpose of the use: includes two elements

(a) Commercial: Here, the releasing of a parody to the public is definitely commercial

(b) Non-profit, educational purposes: Here, while a parody is not written within the statute for fair use, the Court considered it to be a type of commentary and criticism that deserves special protection for fair use

(2) The nature of the copyrighted work

(a) This is a more negligible component of the fair use test, calling for recognition that some works are closer to the core of intended copyright protection than others, with the consequence that fair use is more difficult to establish when the former works are copied

(b) Here, the Court says that while Orbison’s original’s creative expression for public dissemination falls within the core of the copyright’s protective purposes, this portion of the test is negligible

(3) The amount and substantiality of the portion used in relation to the copyrighted work as a whole

(a) Quantitatively: The guitar riff and some lyrics were taken, but the overall lyrics and music progression were different from the original work

(b) Qualitatively: 2 Live Crew incorporated the heart of “Pretty Woman”, but the Court acknowledged that a parody seeks to be recognizable but different

(4) The effect of the use upon the potential market for or value of the copyrighted work.

(a) The Court found that there is a significant likelihood that no one would buy 2 Live Crew’s parody instead of the original “Pretty Woman” for the original purpose

After going through this test, the Court ended up siding with 2 Live Crew (Campbell) on the grounds of fair use. As laid out, this analysis can be applied to the two class action suits with OpenAI:

(1) The nature/purpose of the use:

(a) Commercial: ChatGPT is currently free, and the information it produces is not sold to the public, instead it is being utilized freely to respond to questions asked by users.

(b) Non-profit, educational purposes: Here, Defendants could argue that ChatGPT is trying to educate those who want to learn more about Plaintiff’s books through summaries or discussion of major themes of the books.

(2) The nature of the copyrighted work

(a) This part of the test is more difficult to deduce, as what is closer to the core of intended copyright protection is ultimately determined by a court of law. Ultimately, this test is negligible in deciding whether or not Defendants’ actions constituted fair use, though Plaintiffs could argue that literary works are at the core of copyright protection.

(3) The amount and substantiality of the portion used in relation to the copyrighted work as a whole

(a) Quantitatively and qualitatively, ChatGPT arguably uses the entire copyrighted work to express information about the works, doing so at the inquiry of the user.

(4) The effect of the use upon the potential market for or value of the copyrighted work.

(a) One could argue that no one is using ChatGPT as a substitute for purchasing the actual books, especially due to ChatGPT’s very novel nature. Arguably, currently, ChatGPT does not have a significant effect upon the market of selling literary works.

Considering all of these factors, while ChatGPT does essentially replicate copyrighted literary works by authors in order to inform users about the works (whether it is a chapter, summary, theme, etc.), arguably a court of law could side with OpenAI in this case on the grounds of direct infringement considering the fair use exemption. ChatGPT in its current form is not intending to sell/monetize off of reproducing these works, and its function arguably is to educate users who produce questions for the AI model. There is arguably a negligible effect on the market for purchasing literary works considering ChatGPT’s novel nature and logically, if someone wants to read an actual book, they would most likely purchase the book either in a physical or digital form. Plaintiffs could present evidence on the growing reliance on technology, including artificial intelligence, in modern-day society, arguing that ChatGPT has the potential to be considerable competition for accessing works owned by the Plaintiffs. However, what would most likely be considered is the current-day functionality of ChatGPT, not potential, as potential is not necessarily factual or true at the present moment of time.

ii. Count II: Vicarious Infringement

Vicarious infringement, compared to direct copyright infringement, is “a form of secondary liability” where “a person may be held liable for the infringing acts committed by another if he or she had the right and ability to control the infringing activities and had a direct financial interest in such activities'' [15]. Here, Plaintiffs are arguing that OpenAI is vicariously liable for creation of infringing derivative works with every output that ChatGPT generates in response to user prompts [16]. However, this arguably is a weak argument for infringement, as demonstrated in Enterprise Management Limited, Inc., v. Construx Software Builders, Inc. (2023), in which the Ninth Circuit held that a copyright owner’s registration of a derivative work must encompass or express copyrighted elements of the original work [17]. Arguably, by deeming any output as ChatGPT as a derivative work, Plaintiffs are attempting to expand the accepted definition of what a derivative work is, which goes against copyright law. On the grounds of vicarious infringement, one could arguably find that there is not a sufficient degree of vicarious infringement on the part of OpenAI. Plaintiffs could argue that the law is meant to be open to expansion in order to consider evolving societal concerns (such as the rising prominence of artificial intelligence), however, their definition of a derivative work is still arguably too broad to be accepted as part of copyright law.

iii. Count III: Violation of the DMCA

Plaintiffs allege violation of Section 1202(b) of the DMCA, which prohibits the “intentional removal or alteration” of copyright management information and “distribution or importation of copyright management information knowing that it has been removed, without the authority of the copyright owner” [18]. Copyright management information (CMI) is “information about a copyrighted work, its creator, its owner, or use of the work that is conveyed in connection with a copyrighted work” [19]. Here, Plaintiffs allege that OpenAI “intentionally removed CMI from their infringed works during the training process” of the language models of ChatGPT [20]. Furthermore, they allege that OpenAI “distributed the models’ outputs without Plaintiffs CMI [21]. Here, it is more difficult to make a definitive decision, as this case is ongoing. However, Plaintiffs do need to provide sufficient evidence that there was intent on the part of Defendants to remove CMI of their works from the language models, as that determines if Section 1202(b) of the DMCA is violated. But, for the second allegation, while OpenAI could argue that they were not physically distributing copies of the works by the Plaintiffs, one could find that the virtual nature of the models allows the works to be distributed around the world, wherever a user inquires about the works. So, while the first allegation in violating the DMCA is inconclusive for now, one could argue that OpenAI did violate the DMCA in distributing these works with the CMI removed, without Plaintiffs’ permission. Defendants could argue that this would be a misinterpretation of the DMCA, as it does not explicitly include the (potential) virtual distribution of works within the definition of distributing copies of the works owned by the Plaintiffs, however, society has largely adapted to include virtual distribution (through various mediums) as a method for transferring copies of works without physical contact.

V. Conclusion & Future Implications

To conclude, one could find that OpenAI did not commit direct copyright infringement on the grounds of fair use, as the platform does have educational benefits and does not charge users money to utilize the platform for its information. Furthermore, one could find that OpenAI did not vicariously infringe on Plaintiff’s works by simply answering a question a user poses to it, as derivative works are generally not broadly interpreted in copyright law to be any output. Finally, one could argue that while there will need to be sufficient evidence to prove intentional removal/alteration of CMI of the Plaintiff’s works, that OpenAI did violate the DMCA by distributing these works with the CMI removed, even if it was through a virtual channel. In terms of future implications, there are ongoing dilemmas in how artificial intelligence should be integrated into both society and the laws structuring society. There is currently a struggle to balance the potential benefits provided by AI with the rights of humans, notably under copyright law. In the future, as AI becomes more prominent, there may be greater concerns in its influence on local and national markets such that it may be considered to be committing copyright infringement through reproducing works such as in the case of ChatGPT.

Endnotes

[1] 17 U.S. Code § 102- Subject matter of copyright: In general (1976).

[2] “Copyright in General”. U.S. Copyright Office.

[3] Article I, §8, Clause 8. Intellectual Property Clause.

[4] “How Long Does Copyright Protection Last?”. U.S. Copyright Office.

[5] 17 U.S. Code § 106- Exclusive rights in copyrighted works (1976).

[6] “Property, Intellectual Property, and Free Riding” (2005). Stanford Law School: Stanford Lawyer Magazine.

[8] Silverman v. OpenAI and Tremblay v. OpenAI , Latham & Watkins LLP (pending).

[9] Ibid at 15.

[10] Ibid at 14.

[11] Ibid at 14.

[12] Ibid at 15.

[13] 17 U.S. Code § 107- Limitations on exclusive rights: Fair use (1976).

[14] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994).

[15] “Vicarious Infringement”. Cornell Law School Legal Information Institute (1971).

[16] Silverman v. OpenAI and Tremblay v. OpenAI , Latham & Watkins LLP (pending), 19.

[17] Enterprise Management Limited, Inc., v. Construx Software Builders, Inc., et al, No. 22-35345 (9th Cir. 2023)

[18] Digital Millennium Copyright Act §1202(b), H.R. 2281, 105th Congress (1998).

[19] “Copyright Management Information (CMI)”. Copyright Alliance.

[20] Silverman v. OpenAI and Tremblay v. OpenAI , Latham & Watkins LLP (pending), 21-22.

[21] Silverman v. OpenAI and Tremblay v. OpenAI , Latham & Watkins LLP (pending), 22.

Jun 17 Navigating the Digital Frontier: Exploring Copyright Challenges in the Age of ChatGPT

Jun 17 303 Creative LLC v. Elenis is Deeply Misguided and Profoundly Harmful

Jun 17 Surprise Medical Bills: Examining the History, Criticism, and Evolution of the No Surprises Act