New Claude Model Prompts Safeguards at Anthropic
Digest more
In a landmark move underscoring the escalating power and potential risks of modern AI, Anthropic has elevated its flagship Claude Opus 4 to its highest internal safety level, ASL-3. Announced alongside the release of its advanced Claude 4 models,
The testing found the AI was capable of "extreme actions" if it thought its "self-preservation" was threatened.
1hon MSN
Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported Thursday, citing a company safety report.
Anthropic's most powerful model yet, Claude 4, has unwanted side effects: The AI can report you to authorities and the press.
Alongside its powerful Claude 4 AI models Anthropic has launched and a new suite of developer tools, including advanced API capabilities, aiming to significantly enhance the creation of sophisticated and autonomous AI agents.