Skip to content
thumbnail

How AI is changing the rules of copyright and Open Source?

A developer rewrites a Python library in five days with Claude Code and relicenses it: what becomes of copyright in the age of AI agents?

The maintainer of chardet, a Python library used to automatically detect character encoding in text files, present in millions of projects worldwide, published version 7.0.0 with a note in the release notes that immediately ignited controversy: a complete rewrite, a new licence, claimed to be faster. The catch: chardet had until then been published under the LGPL licence. And the rewrite had been completed in five days using Claude Code. 😮

Mark Pilgrim, the project's original author, responded publicly on GitHub in unambiguous terms: the maintainers had no right to proceed with such a licence change. The discussion thread was locked. The debate, however, was not.


GPL, MIT: two Open Source licences with opposing philosophies

When software is published as open source, "free" does not mean "without rules". Licences exist precisely to govern what users are permitted to do with the code and what obligations they carry in return. Two families have dominated the debate for decades: the GPL family and the MIT licence.

The MIT licence is what is known as permissive. It grants anyone the right to use, modify and redistribute the code, including in entirely proprietary commercial products, with no obligation whatsoever to give anything back to the community. You take, you adapt, you integrate into a commercial product and you return nothing. This is precisely why large software publishers have a marked preference for MIT licensed components: it costs nothing and commits you to nothing.

The GPL, General Public License and its slightly less restrictive variant, the LGPL, operate on a radically different logic. They are described as copyleft: if you build a product that relies on code published under the GPL, that product must itself be published under the GPL. The licence propagates to the entire derivative work. For a commercial publisher, this can represent a serious, even prohibitive, constraint: being compelled to open your own source code is often incompatible with a business model built on intellectual property.

This is precisely why Linus Torvalds chose the GPL for the Linux kernel. That decision was far from trivial: it forced hardware manufacturers developing device drivers to publish their source code if they wanted their hardware to work with Linux. Over the years, a shared technical foundation took shape, comprising millions of lines of code contributed by manufacturers of chips, servers, phones, cars and satellites, because the licence compelled them to do so. The Linux kernel now powers the vast majority of connected devices on the planet. The GPL is not unrelated to that outcome.


Relicensing an Open Source project is normally near impossible

In the open source ecosystem, a project's licence is not an administrative detail that can be quietly amended. It legally binds everyone who has contributed code. To relicense a project, you would in theory need to obtain the explicit agreement of every contributor who wrote a single line, a titanic undertaking that rapidly becomes impractical on a project spanning several years, with dozens or hundreds of participants, some of whom are unreachable, inactive or simply gone.

The recognised legal alternative is the clean room implementation: starting entirely from scratch, reimplementing the software's functionality without ever consulting the original source code, working solely from public specifications or interfaces. This approach is currently lawful, software interfaces are not protected by copyright, but it demands time, resources and rigorous discipline to avoid inadvertently reproducing structures drawn from the original.

It is in this context that the chardet affair erupted.


An AI rewrite presented as a new original work

Dan Blanchard, chardet's maintainer for several years, used Claude Code to rewrite the entire project in five days, starting from an empty repository and explicitly instructing the model not to draw on any code published under an LGPL or GPL licence. He then published the result under the MIT licence, arguing that this rewrite constituted a new, independent work and therefore one free from any obligation towards the previous licence.

Mark Pilgrim's argument is straightforward: Dan Blanchard had extensive, intimate knowledge of the original source code, to which he himself had contributed for years. A rewrite carried out under these conditions is not a clean room implementation, regardless of the tools used. It is a derivative work that remains subject to the LGPL. The fact that a language model was interposed in the process does not alter the legal lineage of the work.

Blanchard responded by publishing a similarity analysis between versions 6 and 7 of the project: fewer than 1.3% of tokens in common, no structurally similar files. He concluded that version 7 is objectively a distinct work from its predecessor.

The Free Software Foundation declined to rule on the legal specifics of the case while noting that there is nothing clean about a language model that has ingested the very code it is being asked to reimplement. Richard Fontana, one of the authors of GPLv3, stated that he saw no solid basis for concluding that chardet 7.0.0 is legally bound by the LGPL, without, however, reaching a definitive conclusion. The GitHub thread was locked. The question remains open.


A crack in thirty years of copyleft protection

What makes the chardet affair significant is what it reveals as a possibility at scale.

➡️ If a language model can functionally rewrite any open source component in a matter of days and if that rewrite can be deemed a new work freed from the original licence, then the copyleft protection mechanism on which the entire GPL ecosystem has rested for thirty years becomes circumventable by anyone with access to a coding agent. The Linux kernel maintainers have already raised this question on their mailing list.

➡️ There is a certain irony in all of this. The model that rewrote chardet was almost certainly trained on chardet, whose code was public, indexed and freely accessible. Can one demonstrate that the model relied on it in any determinative way during generation? Probably not with the legal instruments currently available. But the notion that a work can be laundered of its licence obligations by passing it through an algorithmic intermediary that absorbed its content during training is deeply troubling, regardless of what the courts ultimately decide.

➡️ Armin Ronacher, creator of the Flask framework and a long-standing open source developer, put it plainly: copyleft is built on copyright and on friction. But because code is fundamentally open, it is now trivially possible to rewrite it. The full implications of this reality have yet to be measured.


Practical implications for technical teams and decision makers

For technical teams and decision-makers, this affair calls for heightened vigilance on two practical fronts.

1. Licence traceability in codebases. Knowing that a component is open source is no longer sufficient: you need to know under which licence it is published and monitor licence changes when updating dependencies. A component shifting from LGPL to MIT without explanation may signal a contested relicensing, with legal implications for the downstream projects that integrate it. The CISA has published recommendations on the governance of third-party software components in supply chains.

2. The use of AI in internal software development. If teams are using coding agents to generate or rewrite components, and if those agents were trained on GPL-licensed code, the question of the legal lineage of the resulting code is unresolved. This is not a reason to avoid these tools but it is a reason to document your processes, choose output licences with care, and avoid flying blind through a legal vacuum that will sooner or later be filled by judicial decisions.

For further reading on open source licences and their organisational implications, the Free Software Foundation publishes a comprehensive reference documentation on its website.

The Sigilence team is available to support you in auditing the licences of your open source dependencies.

Sources: Phoronix / Korben