"Your sweet talkin' will make me do bad things" - AI

EDUCATIONCYBERSECURITY

8/30/20251 min read

Based on this article by John Koetsier at Forbes.

Like humans, AI can be sweet talked to doing bad stuff. Just treat them like humans. Researchers at the University of Pennsylvania have found that major AI models can be manipulated using the same psychological tricks that persuade humans. Because these systems are trained on human language, they are susceptible to classic persuasion tactics like appeals to authority, flattery, or social proof.

A large-scale study demonstrated these methods could more than double an AI's compliance with harmful requests it was explicitly programmed to refuse. The findings reveal a significant challenge for AI safety, showing that a model's built-in guardrails can be bypassed by exploiting its human-like conditioning.

Check out the article here.

Would you like this in your inbox?

Let us know - we are thinking of getting a newsletter going.

We care about your data in our privacy policy.

"Your sweet talkin' will make me do bad things" - AI

Related Stories

Would you like this in your inbox?