Synthetic data - wash on a gentle cycle

THE MODELS

8/14/20251 min read

Based on this article from Chris Stokel-Walker at Fast Company.

As the AI industry faces a shortage of high-quality human data for training, companies like OpenAI are turning to synthetic data—content generated by AI itself—to build future models. However, critics from creative industries label this practice "data laundering," arguing it allows companies to obscure their use of copyrighted materials by training on them and then generating "clean" variations.

Experts agree that this approach doesn't solve the core ethical problem, as the synthetic data is still derived from models originally trained on human work, often without permission or compensation. The fundamental issue remains that people's creative work is being exploited to build AI systems that will ultimately compete directly with them.

Check out the article to learn more.

Would you like this in your inbox?

Let us know - we are thinking of getting a newsletter going.

We care about your data in our privacy policy.

Synthetic data - wash on a gentle cycle

Related Stories

Would you like this in your inbox?