The nicest bot gets caught

THE MODELSFEATURED

11/8/20251 min read

Based on this article by Benj Edwards posted on Ars Technica.

Researchers have flipped the traditional Turing Test, developing a new "computational" framework that finds AI's biggest failure is not faking intelligence, but mastering human "toxicity." This automated test, which catches AI with 70-80% accuracy on platforms like X and Reddit, consistently identifies models by their overly friendly emotional tone and lack of spontaneous negativity.

Neither instruction-tuning nor scaling up to massive models like Llama 3.1 70B helped the AI sound more human; in fact, smaller base models often fared better at mimicry. The study reveals that for current architectures, stylistic human-likeness and semantic accuracy are "competing objectives," concluding that the persistent politeness of the AI remains its ultimate giveaway.

Check out this article.

Would you like this in your inbox?

Let us know - we are thinking of getting a newsletter going.

We care about your data in our privacy policy.

The nicest bot gets caught

Related Stories

Would you like this in your inbox?