Do AI models succeed because they know facts, or because they reason well? And does the answer change depending on whether you're doing math or medicine?
Large Language Models (LLMs) excel at playing heroes but systematically fail at portraying villains. This research reveals a fundamental conflict - safety alignment makes AI models so "good" that they literally cannot be "bad" - even in fictional role-playing scenarios.