Synthetic data - wash on a gentle cycle
THE MODELS


Based on this article from Chris Stokel-Walker at Fast Company.
As the AI industry faces a shortage of high-quality human data for training, companies like OpenAI are turning to synthetic data—content generated by AI itself—to build future models. However, critics from creative industries label this practice "data laundering," arguing it allows companies to obscure their use of copyrighted materials by training on them and then generating "clean" variations.
Experts agree that this approach doesn't solve the core ethical problem, as the synthetic data is still derived from models originally trained on human work, often without permission or compensation. The fundamental issue remains that people's creative work is being exploited to build AI systems that will ultimately compete directly with them.
Check out the article to learn more.


