"Your sweet talkin' will make me do bad things" - AI
EDUCATIONCYBERSECURITY


Based on this article by John Koetsier at Forbes.
Like humans, AI can be sweet talked to doing bad stuff. Just treat them like humans. Researchers at the University of Pennsylvania have found that major AI models can be manipulated using the same psychological tricks that persuade humans. Because these systems are trained on human language, they are susceptible to classic persuasion tactics like appeals to authority, flattery, or social proof.
A large-scale study demonstrated these methods could more than double an AI's compliance with harmful requests it was explicitly programmed to refuse. The findings reveal a significant challenge for AI safety, showing that a model's built-in guardrails can be bypassed by exploiting its human-like conditioning.
Check out the article here.


