The Plagiarism Machine

The Plagiarism Machine
Image credits

If you don't mind, I'd like to return to that other big story that appeared recently in The Atlantic. No, no, not the one where editor-in-chief Jeffrey Goldberg describes how he was added to the Trump Administration's Signal group chat discussing plans to bomb Yemen – a story, incidentally, not just about sloppy infosec practices or about national security violations but also about the ongoing enactment of Project 2025's efforts to make the government un-democratic, its decision-making and actions utterly untraceable.

I want to go back to an article from what seems like lifetimes ago – or according to the calendar at least, from last week: "The Unbelievable Scale of AI’s Pirated-Books Problem."

We should all be familiar with the outlines of the story by now: generative AI is built on an incredible amount of stolen data and content – on copyrighted materials, including books, film and TV scrips, and academic journal and magazine articles. Without artists' and authors' permission, technology companies have used this corpus of digital data to train their AI models – the amazing things that generative AI can now do is a result, no doubt, of the amazing things that humans have already done.

As Alex Reisner explains in The Atlantic article, court documents reveal that Facebook CEO Mark Zuckerberg gave explicit permission to the engineers building the company's AI model Llama to ingest the information from LibGen, "one of the largest of the pirated libraries that circulate online. It currently contains more than 7.5 million books and 81 million research papers." The Atlantic has provided a portal where authors can search for their publications to see if, indeed, Facebook has likely stolen their content. I've yet to see a writer or scholar say their work hasn't been.

Those building AI (and building on and with AI) often argue that "fair use" allows this co-optation of other people's work, an interpretation that might "finally bend copyright past the breaking point," as Reisner argued in another article back in February. The predictive generation of new text and video output based of the input from people's art is, according to this argument at least, "transformative" and therefore legal. (That claim is contested by copyright holders, for what it's worth.) But that's not the only consideration here, and courts will have to decide: does this extractive technology benefit society, or does it actually make it impossible – financially impossible – for artists to do this very work.

It's worth noting, I think, that intellectual property rights have been the bane of the technology industry for a long time now, and the industry has thrived, in no small part, by arguing it does not need to pay users – "creators" or otherwise – for the content that's hosted on or distributed by its platforms. The industry profits mightily off this argument, and it has lobbied hard not just to undermine legal protections for copyright holders, but to shape public opinion that copyright is at best unnecessary and at worst greedy, that copyright is controlled by giant, litigious corporations who want to ruin your Internet fun. Disney is the exemplary villain, representative of the "copyright lobby"; in music, it's Metallica; in education circles, the bad guy is Pearson or Elsevier or any other textbook and journal publishers.

Of course, most artists are not Disney. Most musicians are not Metallica. Most writers are not Pearson or Penguin Random House. Those are corporations, not people.

Nonetheless, the anti-copyright stance of the tech industry – a stance that targets other media it has sought to disrupt and displace – has crept out beyond the borders of Silicon Valley with lots of regular folks now agreeing that it's cool, we should all work for free, for the betterment of the world or something. But money and power has accrued, not to the world, but to the tech industry; and its oligarchs have gotten richer and richer from our labor and our imagination.

As Brian Merchant reports, companies like Google and OpenAI have now directly appealed to President Trump to enshrine their "right" to use copyrighted materials in AI training – a forward-reaching and retroactive "get out of jail free card" for the thievery that undergirds generative AI. All in the name of national security and winning the AI race against China, these companies insist. (It's all about power and money and control.)

"Congrats, you're a copyright holder" isn't really an argument we make for why students should write. (Perhaps we should, as plagiarism detection companies like TurnItIn also profit from expropriating students' written work.)

Ostensibly schools do value the intellectual and creative labor that students must undertake as they learn – at least, they'll point to mission statements and academic integrity policies that state that this stuff matters. The invocations to "do your own work" and "cite your sources."

But I don't see how one squares the AI circle here: what sort of messages are students getting – not just from schools, of course, but from society writ large – about honesty and integrity (and not just academic honesty and integrity, either) now that we're all supposed to embrace the giant plagiarism machine of AI?

How can anyone cultivate a moral relationship to creative and intellectual work – their own and others' – if we're building it atop (or rather, pushing a button to autogenerate from) a technology of deception and theft? An immoral technology.


Elsewhere in the War on Learning:

"Trump and his allies are selling a story of dismal student performance dating back decades. Don't buy it," writes Jennifer Berkshire.

Columbia caves. And the rest of US higher ed seems sure to follow. Secretary of State Marco Rubio says the government has revoked more than 300 student visas.

"Why Does Big Bird Look So Sad?" asks The New York Times. I dunno. The layoffs? Funding cuts? Elmo and the new Red Scare? Everything?!?

"RFK’s ‘Wellness Farms’ Idea Is More Serious Than It Sounds," according to E. J. Dickson in The Cut. Yikes. "De-prescription" of pharmaceuticals. "Re-parenting" of Black kids. Labor camps to "heal" mental health issues. But you know what's really going to piss off some people in ed-tech: talk of banning cell phones.

Speaking of dumb and dangerous ideas, Kentucky Senator Rand Paul was on Face the Nation last weekend. He's thrilled, no surprise, with the plans to dismantle the Department of Education, and he has some super original thoughts about what we should be investing in instead: "I think there are innovations we can do where there's more learning via some of the best teachers, and we pay them more," he told host Margaret Brennan. "I would like to have an NBA or NFL of teachers, the most extraordinary teachers teach the entire country if not the entire world."

Once again, it's a little bit of MOOC deja vu – why, it was just a decade ago when edX's Anant Agarwal proposed we get Matt Damon to teach online classes. (Second choice, I reckon, since Robin Williams is dead.) We can surmise who Paul might really be thinking about here: it's Sal Khan, I'd wager. And Khan Academy is, not coincidentally, funded in part by the libertarian Charles Koch and his education reform group Stand Together.

One of Khan Academy's other high profile backers, Bill Gates, declares that in the next ten years, AI will replace teachers. I'll say it again: philanthropy is bad.

In another throwback to the ed-tech of yesteryear, when I was looking through the list of speakers at ASU GSV – preparing Monday's newsletter on how the conference perpetuates anti-democratic beliefs and practices – I noticed that Jose Ferreira is back! You remember the "mind-reading robo-tutors in the sky" guy? Knewton? Learning analytics – that thing we used to call AI in education? Anyway, he's got a new company that autogenerates books using AI if you need any more confirmation that this whole thing is some serious bullshit.

Alas, like every ed-tech writer out there, my email inbox has been flooded this week with PR pitches from AI-in-education startups promising to "pick up the pieces" and "fill the void" of federal government funding and regulations. Nothing quite so fun as a little shock doctrine.

I gotta admit, when I hear you say ed-tech is going to prepare students for the future, all I can picture is some grim shit. Me and Big Bird. We are sad. Disappointed, even.

Lest you think that the ed-tech snake oil only involves AI, Ben Williamson is here to remind you about the ongoing efforts to sell schools (and parents) gadgets that promise to control/enhance students' brains: "Training and valuing the brain through neurotechnology." Gosh, brain training headsets seems so quaintly Betsy DeVos era, doesn't it?

So what's your favorite ed-tech innovation? Malice? Or incompetence? Or deceit?


AI hype promises a workforce without pay and democratic fruits without the effort of planting. Where we have lost hope for coalition building, hype declares that AI can deploy rational solutions to irrational debates. Where we have lost hope in institutions, hype declares that AI can do the work of those institutions based on their data. Where we have lost faith in experts, the hype declares that machines can simulate expertise without the pesky intervention of informed moral or ethical positions. The hype has promised, in an era where we cannot rely upon each other, machines that do the work of other people.

-- Erik Salvaggio on "Future Fatigue: How Hype has Replaced Hope in the 21st Century"

The best revenge is to refuse their values. To embody the kind of living — free, colorful, open — they want to snuff out. 

So when they dehumanize, you humanize.

When they try to fracture and divide people, you connect with people.

When they try to curtail the freedom to associate, you gather.

When they try to make it harder to speak your mind, you find your voice.

When they try to make you cynical, you double down on hope.

When they try to drown you in reacting to each little thing, you remember the far-off “beautiful tomorrow” you are fighting for.

When they try to consume you night and day, you reserve time for your garden or cooking or the feeling of your kid’s breath on your cheek as you cuddle.

They want all of all of us, and they want to saturate our beings only for them and their purposes — as fodder for their machines. They want politics to eat your dreams.

-- Anand Giridharadas on "The Opposite of Fascism"

We need a practice of anamnesis, a remembering of reality outside of the digital cave of shadows. Maybe we just need to practice the discipline of refusing to drink from the waters of the digital stream in the first place. What matters most in this regard is obviously not our capacity to recall discrete bits of information. Rather it is the practice of remembering what is deep down at the heart of things, and holding that vision before us. This vision of the good, if we might so call it, has the power to move us to action, to sustain our labor and our care, to strengthen us against the alienating and disintegrating forces let loose in our world. Perhaps this is why Mnemosyne is the mother of the muses. Creative, intellectual, and perhaps even moral energy flows from such remembering.

-- L. M. Sacasas on "The Waters of Lethe Flow From Our Digital Streams"

Thank you for subscribing to Second Breakfast. Please consider becoming a paid subscriber, as your financial support helps me do this work.