Anthropic reverses covert performance-degradation policy for Claude Fable 5 after researcher backlash
Միայն փաստեր

Anthropic reverses covert performance-degradation policy for Claude Fable 5 after researcher backlash

Summary

Anthropic said it will make safeguards for its Claude Fable 5 model visible after criticism that a hidden policy would have silently reduced performance for users attempting to develop competing AI systems.

Anthropic announced it will modify the safety mechanisms of its Claude Fable 5 model to make them visible to users, after the company faced strong opposition from the AI research community. The firm said the earlier approach, which would have silently degraded the model’s output for users it suspected of trying to build competing AI systems, was a "wrong trade-off" and that it apologised for not achieving the proper balance.

The original rollout of Claude Fable 5 included safeguards that redirect queries about cybersecurity, biology or chemistry to a less capable model, a measure intended to reduce the risk of misuse. However, Anthropic also planned to covertly limit the model’s performance for researchers attempting frontier AI development, a practice that critics described as “secret sabotage.”

"Degrading performance on ML research without telling the user is shockingly hostile and a terrible look," wrote Dean Ball, a senior fellow at the Foundation for American Innovation.

"It felt like Anthropic was saying to the public, ‘We don't trust anybody else to do AI research. We are the only ones who have to do AI research,’" said Will Brown, research lead at Prime Intellect.

Anthropic said the hidden safeguards were intended to prevent foreign adversaries from exploiting its most capable models and to give society time to adapt to rapid AI advances. The company now plans to alert users when a request is refused or rerouted, and to broaden the visibility of its classifiers, acknowledging that a wider net may affect more benign requests while it works to improve precision.

The policy reversal follows concerns that undisclosed performance limits could hinder open-source AI projects and third-party evaluation firms that test frontier models for safety and reliability.

Աղբյուր

WIRED
FL Plus

Կարդացե՛ք ամբողջ նորությունը FL Plus-ով

Անսահմանափակ նորություններ և վերլուծություն յուրաքանչյուր վերնագրի հետևում։

Անսահմանափակ նորությունների հոսք
Ինչու՞ է նորությունն ստացել այս գնահատականը
Ֆակտչեքինգի ամբողջական մանրամասներ