Back
categories.irecord12 June 2026

AI without data doesn't exist, and that's the actual problem

Billions of images and the question of where they come from.

AI without data doesn't exist, and that's the actual problem

A simple truth nobody says aloud.

We talk about artificial intelligence as if it's something magical. An algorithm that thinks, almost feels. But all AI really does is recognize patterns in data. Billions of datapoints. And each of those billions came from somewhere. From someone.

ChatGPT is trained on internet data. That sounds abstract. But concretely it means someone took your tweet, your blog, your photograph and used it to build this system without paying you, without asking. You are data.

Cathy O'Neil documented in Weapons of Math Destruction how algorithms are not neutral. They're as neutral as the data they're trained on. If you train an AI on images of criminals, and 70 percent of those images are Black Americans, not because they're criminals but because that group is disproportionately arrested, then your AI doesn't learn "who is criminal," it learns "Black men are probably suspicious." It's not a bug, it's a feature of the data.

OpenAI and others plow through billions of images. Not out of scientific curiosity, but because more data means better results, better products, more money. The volume expanded not because we suddenly generated billions of images, but because we claimed all internet images.

The question nobody asks is simple: whose are those billions of images actually? Who gave permission? In which law does it say that when I put a photo on Instagram, I agree that this photo can be used forever to train the next generation of AI?

The answer is: nowhere. We never gave permission. We just accepted it, literally, when we clicked that button without reading the terms. AI companies simply took what was available. And it's legal because the internet is public, and public apparently means "I can use it."

What disappears here is not visible. It's much deeper. What disappears is the foundation of consent. The idea that when you share something of yourself, you decide what happens with it. AI erodes that foundation not violently, but gradually, in algorithms, in scale, in abstraction.

I built iRecord because this abstraction bothers me. The thought that millions of Dutch people hand over their identity to centrally managed databases, digital traces of yourself where you have no control. It's not paranoia. It's simply the logical consequence of how tech works now.

The question for AI is not whether it can. It can. The question is whether we want it to, without you—whose images train it—owning anything in the end.


Sources: Cathy O'Neil, 'Weapons of Math Destruction' (Crown Publishers, 2016); MIT Media Lab studies on dataset bias; OpenAI data sourcing documentation

Source: MIT Media Lab; Cathy O'Neil, 'Weapons of Math Destruction' (Crown, 2016)