The Copyright Conundrum, Why India’s AI Future Hinges on Restoring Balance to an Ancient Law
From the Marrakesh Treaty to Text and Data Mining: The Battle Over Knowledge Access Enters a New Phase
There is a story that Pranesh Prakash, a technology law and policy consultant, tells about a former colleague named Nirmita. Nirmita is visually impaired. She wanted to read a book that was available from the United States in a disability-friendly format called DAISY—Digital Accessible Information System. Under the copyright laws that prevailed at the time, she could not legally obtain that book. Meanwhile, Prakash, as a sighted reader, could purchase any print or e-book he wished.
The absurdity of this situation is not merely personal; it is structural. Copyright law, designed to encourage creativity by granting authors limited monopolies over their works, had somehow evolved into a system that denied access to the very knowledge it was supposed to promote. A blind woman could not read a book because the law privileged the rights of publishers over the rights of readers—especially readers with disabilities.
That absurdity sparked a global movement. Years of advocacy by non-governmental organisations and international coalitions of disability rights groups led to the Marrakesh Treaty, which enables the cross-border exchange of accessible-format books and establishes national exceptions for visually impaired persons to use technology to convert books into accessible formats when publishers do not make them available. The copyright industry—from book publishers to the movie industry—opposed the treaty fiercely. Any exception to copyright law was viewed as fundamentally unacceptable, even when the cost of that unacceptability was denying access to the blind.
The struggle of visually impaired persons against overly rigid copyright laws highlights a fundamental problem: copyright has expanded far beyond its original purpose. What was once a limited monopoly designed to incentivise creation has become a nearly perpetual barrier to access, innovation, and progress. And now, with the rise of artificial intelligence, that problem has taken on new urgency.
The Historical Arc of Copyright
To understand where we are, it helps to understand where we came from. Humans have had art for far longer than we have had copyright. The Statute of Anne, widely regarded as the first copyright law, was passed in Britain in 1710—long after the era of Shakespeare and Milton. That law granted authors a limited monopoly of 14 years, with the possibility of one renewal. The monopoly right would only vest if the work was specifically registered and multiple copies deposited for distribution among libraries and universities. The default was the public domain. Copyright was the exception.
Britain brought copyright law to India in 1847. The current Copyright Act dates from 1957. The contrast with the Statute of Anne could not be starker. Under the current law, the monopoly right vests automatically the moment “a work” is created. No registration is required. No deposit is necessary. And the term of protection extends for the author’s entire lifetime plus 70 years posthumously.
This means that the thousands of random Instagram posts and notebook doodles you have made are all protected under copyright law for nearly a century. The public domain, once the default, has become a residual category—what is left over after centuries of accumulated copyrights have expired. A nearly perpetual copyright monopoly is now the default, regardless of the commercial potential of the work or the ambitions of the creator.
This fundamental change in the nature of the law has deleterious consequences. It creates a world where access to knowledge is increasingly mediated by rights holders, where the ability to build upon the past is constrained by legal permissions, and where technological innovation must navigate a thicket of potential infringements.
The AI Dimension
The AI Impact Summit now underway in New Delhi brings these issues into sharp focus. Large language models and other AI systems require vast quantities of training data. For language models, that data inevitably includes copyrighted works—books, articles, websites, poems, and more. The question of whether and how AI developers can use these materials without obtaining permission from every rights holder has become one of the most contested issues in technology policy.
Prakash and his colleagues at LIRNEasia, a Sri Lankan think tank, studied the data governance regimes of seven countries in South and Southeast Asia. Their findings are sobering. In four out of seven countries, the law makes search engines and AI training illegal. This is not a hypothetical concern; it is an actual legal reality.
Consider how search engines work. To index the web, they need to copy as much of it as they can—a process called “crawling”—effectively creating a mirror copy of all that is reachable through links. This copying is essential to the function of search. But permissionless copying is prohibited by copyright law. The only reason search engines exist is because countries have created exceptions that recognise the difference between human reading and machine processing.
The study found that only the Philippines and Sri Lanka have a flexible “fair use” exception that can accommodate these activities. India, to its credit, introduced a specific exception in 2012 for “the transient or incidental storage” for “providing electronic links, access or integration.” This exception covers search engines. But it does not clearly cover AI training.
In most countries surveyed, the legal status of AI training on copyrighted materials remains uncertain. This uncertainty is not benign; it chills innovation, discourages investment, and pushes AI development into jurisdictions with more sensible rules.
Sensible Exceptions Around the World
Other jurisdictions have recognised the need for clarity. The European Union has adopted text and data mining exceptions in its copyright regime. Japan has gone further, allowing an exemption for “Exploitations not for enjoying the ideas or emotions expressed in a work”—in other words, uses by machines rather than humans—and explicitly permits “using the work in data analysis.” Singapore has adopted a flexible fair use exception that can accommodate technological developments. South Korea and Hong Kong are in the process of doing the same.
The logic underlying these exceptions is straightforward: copyright was never meant to cover mechanistic uses. When a machine processes a text to identify statistical patterns, it is not “reading” in the human sense. It is not enjoying the ideas or emotions expressed. It is treating the text as data. The purpose of copyright—to incentivise the creation of works for human enjoyment and edification—is not implicated by such uses.
India has not yet adopted a broad text and data mining exception. This creates legal uncertainty for AI developers who need to collect training data. And by not having a flexible, general, and open-ended exception like those in Singapore and the United States, India ensures that copyright law will always lag behind technological developments. Every new innovation will face the same uncertainty, the same legal risk, the same chilling effect.
The Misguided Focus on Job Protection
Opponents of text and data mining exceptions often raise concerns about the impact of generative AI on creative labour. If AI can produce text, images, and music, what will happen to writers, artists, and musicians? Should copyright law not protect their livelihoods?
These concerns are understandable, but they misunderstand what copyright is for. Copyright is meant to encourage creativity, not to protect jobs. It does so by granting authors limited monopolies over their works, allowing them to profit from their creations. But copyright has never prohibited learning from examples and imitating. Every artist studies predecessors. Every writer reads widely. The fact that AI can now do something analogous at scale does not change the fundamental principle.
Moreover, technology has always displaced jobs. We have far fewer rickshaw pullers, telegraphists, panthakawallahs, stenographers, lift operators, bank tellers, typesetters, darkroom technicians, and draughtsmen than we once did. Yet technology has also created new jobs, new industries, new forms of creativity. The advent of photography reduced the demand for portraitists but enabled new forms of visual expression and democratised access to images. The impact of generative AI is unknown. It may require new forms of social support—government grants for arts and culture, strengthened cooperative movements, funded by taxes on AI companies. But these are not issues to be dealt with in copyright law.
What Copyright Should Protect
If copyright should not be used to block AI development, what should it protect? Prakash offers a constructive answer: copyright law should protect contributions to the commons.
Open-licensed AI models and datasets exemplify this principle. Developers and researchers who absorb massive computational costs to create new capabilities and then release them freely are adding to the common heritage of mankind. They are not subtracting from it. Copyright law should encourage such contributions, not hinder them with the same restrictions designed to prevent commercial exploitation.
Governments are uniquely positioned to curate high-quality, locally relevant datasets for public consumption—at least when used for training open-source models. By making these datasets available, governments can ensure that AI development reflects local languages, local contexts, and local needs. They can also ensure that the benefits of AI are not captured entirely by large corporations based in wealthy countries.
This is not a pipe dream; it is a practical agenda. It requires legal reform to create certainty for AI training. It requires public investment in data infrastructure. It requires a shift in mindset from copyright maximalism to copyright balance.
The Weaponisation of Copyright
The history of copyright is littered with examples of the law being weaponised to block beneficial technologies under the guise of protecting creators. The Authors Guild in the United States used copyright to block Amazon Kindle’s “Read Aloud” function, despite it being assistive technology that enabled visually impaired persons to listen to books they had legally purchased. Publishers opposed the Marrakesh Treaty that would give blind people access to books. The pattern is consistent: rights holders invoke copyright to protect their business models, even when the cost is borne by the most vulnerable.
Current copyright law blocks technologies that could democratise access to knowledge, unleash creativity, and drive innovation—the very things that copyright was meant to foster. This is not a bug; it is a feature of a system that has been captured by incumbent interests. The original purpose of copyright—to promote the progress of science and the useful arts—has been lost in a thicket of extensions, expansions, and enforcements.
India’s Moment
India’s hosting of the AI Impact Summit is an opportunity. It is a chance to lead—not just in AI adoption, but in the legal and policy frameworks that will shape AI’s development. The choices India makes about copyright will ripple through the global system, influencing other countries and setting precedents that will be hard to reverse.
The path forward is clear. India should adopt a broad text and data mining exception that covers AI training. It should embrace flexible fair use provisions that can accommodate technological change without requiring constant legislative updates. It should curate and publish high-quality datasets for public use. And it should resist the lobbying of incumbent industries that seek to use copyright as a weapon against innovation.
This is not about being soft on piracy or indifferent to creators’ rights. It is about restoring balance to a system that has lost its way. Copyright was never meant to be a perpetual monopoly. It was never meant to block access for the disabled. It was never meant to criminalise the basic operations of search engines and AI. It was meant to promote creativity and knowledge. It is time to bring it back to those roots.
The story of Nirmita, the visually impaired woman who could not access a book, is a story about what happens when copyright forgets its purpose. The story of AI training data is a story about what happens when copyright fails to adapt. Both are stories of exclusion, of barriers, of law standing in the way of human flourishing.
India has the opportunity to write a different story. The AI Impact Summit is the moment to begin.
Q&A: Unpacking India’s Copyright Challenge in the AI Era
Q1: What is the significance of the Marrakesh Treaty story in understanding copyright’s problems?
A: The Marrakesh Treaty story illustrates how copyright maximalism harms the most vulnerable. Visually impaired persons like Nirmita were denied access to books in accessible formats because copyright law prioritised publisher rights over reader access. Publishers opposed the treaty fiercely, viewing any exception to copyright as unacceptable even when the cost was denying blind people the ability to read. This episode reveals a fundamental problem: copyright has expanded far beyond its original purpose of incentivising creativity and now actively obstructs access to knowledge. The same dynamic is playing out with AI, where copyright is being used to block beneficial technologies rather than promote progress.
Q2: How has copyright law changed from its origins to the present day?
A: The first copyright law, Britain’s Statute of Anne (1710), granted authors a limited monopoly of 14 years with one possible renewal. Copyright only vested upon registration, and copies had to be deposited for libraries and universities. The public domain was the default; copyright was the exception. Today, copyright vests automatically the moment a work is created, requires no registration, and lasts for the author’s lifetime plus 70 years. This means every Instagram post and notebook doodle is protected for nearly a century. What was once a limited incentive has become a nearly perpetual barrier, with the public domain reduced to a residual category.
Q3: What did the LIRNEasia study find about copyright and AI training in South and Southeast Asia?
A: The study examined data governance regimes in seven countries. It found that in four out of seven, the law makes search engines and AI training illegal. Web search engines need to copy the web through “crawling,” effectively creating mirror copies, but permissionless copying is prohibited by copyright law. Only the Philippines and Sri Lanka have flexible “fair use” exceptions that can accommodate such activities. India has a specific 2012 exception for transient storage that covers search engines, but it does not clearly cover AI training. Most countries in the region lack legal clarity, creating uncertainty that chills innovation and pushes AI development to jurisdictions with more sensible rules.
Q4: How have other jurisdictions approached text and data mining for AI?
A: Several jurisdictions have adopted sensible approaches. The European Union has text and data mining exceptions in its copyright regime. Japan allows an exemption for “Exploitations not for enjoying the ideas or emotions expressed in a work”—machine uses rather than human uses—and explicitly permits “using the work in data analysis.” Singapore has adopted a flexible fair use exception. Hong Kong and South Korea are in the process of doing so. The common thread is recognition that copyright was never meant to cover mechanistic uses where machines process text as data rather than humans reading for enjoyment.
Q5: What should India do to bring copyright law into the 21st century?
A: Prakash argues for several concrete steps. First, adopt a broad text and data mining exception that clearly covers AI training. Second, embrace flexible fair use provisions that can accommodate technological change without constant legislative updates. Third, curate and publish high-quality, locally relevant datasets for public use, especially for training open-source models. Fourth, resist industry lobbying that seeks to use copyright as a weapon against innovation. The goal is to restore balance to a system that has lost its way—returning copyright to its original purpose of promoting creativity and access to knowledge, rather than allowing it to become a perpetual barrier to progress.
