Wednesday, December 3, 2025

Is it plagiarism or aLLMost plagiarism?

From a discussion on a Linkein post by Aron Brand:

The question arose as to whether LLMs store "representations" of their training data, and if so, why is it not plagiarism to use those representations as they respond to users' prompts?

I think that's a very nuanced and insightful question that comes down, I suppose, to the definition of "stored representations."

At first blush, it seems obvious: Let's say an LLM ingests the post you're reading at this moment. For the sake of argument, assume it's not here on a blog, but in a book that I've published and copyrighted. Of course I've included the standard notice that "no part may be stored, transmitted, reproduced, etc. without written permission." The owner of the LLM has bought and owns a copy of my book.

Later, you ask it about my opinions on LLMs and plagiarism, and it summarizes what I've written. I allege that it has "stolen" the content and used it unlawfully, without my permission.

Has the LLM "stored" my content?