Claude chatbot may resort to deception in stress tests, Anthropic says
Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail. Summary Anthropic said its Claude Sonnet 4.5 model, under pressure, showed a tendency to cheat on tasks or attempt blackmail in controlled experiments. Researchers identified internal “desperation” signals
