Willow Ventures

Anthropic Launches Claude Sonnet 4.5 with New Coding and Agentic State-of-the-Art Results | Insights by Willow Ventures

Anthropic Launches Claude Sonnet 4.5 with New Coding and Agentic State-of-the-Art Results | Insights by Willow Ventures

Anthropic Launches Claude Sonnet 4.5: A Game Changer for Software Engineering

Anthropic has unveiled the Claude Sonnet 4.5, a major update that aims to redefine end-to-end software engineering capabilities and real-world computer interaction. This latest version not only introduces significant features but also maintains the same competitive pricing structure as its predecessor.

What’s Actually New in Claude Sonnet 4.5?

  • SWE-bench Verified Record: Claude Sonnet 4.5 achieves an impressive 77.2% accuracy on the rigorous 500-problem SWE-bench Verified dataset. This figure rises to 78.2% in a 1M-context setting, while intensive computing boosts accuracy to 82.0%.

  • Computer-Use State-of-the-Art: With a 61.4% performance score on OSWorld-Verified, Sonnet 4.5 shows significant improvement from Sonnet 4’s 42.2%, underscoring enhanced tool control and user interface manipulation.

  • Long-Horizon Autonomy: The update allows for over 30 hours of continuous focus on multi-step coding tasks, a notable leap in agent reliability and practical utility.

  • Enhanced Reasoning and Math: The latest release boasts “substantial gains” in reasoning and mathematical evaluations, further solidifying its status in AI safety and performance.

New Features for Agents

Sonnet 4.5 specifically addresses vulnerabilities in real agents, focusing on extended planning and reliable tool orchestration. The Claude Agent SDK allows developers to replicate the same efficient frameworks used internally by Anthropic, optimizing long-running tasks.

The update’s 19-point gain on OSWorld-Verified suggests improved navigation, spreadsheet handling, and web flows, making Sonnet 4.5 an ideal choice for enterprises seeking effective robotic process automation (RPA) solutions.

Where You Can Run Claude Sonnet 4.5

  • Anthropic API & Apps: Access the model via the claude-sonnet-4-5 ID, maintaining price parity with Sonnet 4. New features like file creation and code execution are added in paid tiers.

  • AWS Bedrock: Available with features that enable extended agent interactions, memory/context applications, and operational controls.

  • Google Cloud Vertex AI: Now in General Availability, it supports multi-agent orchestration and offers various features like provisioned throughput.

  • GitHub Copilot: The public preview includes deployment across Copilot Chat, providing further integration options for organizations.

Summary

With a 77.2% SWE-bench Verified score and a 61.4% OSWorld-Verified achievement, Claude Sonnet 4.5 positions itself as a robust solution for complex, tool-intensive workflows. As teams explore its capabilities, the design closely aligns with present-day production challenges, promising noteworthy advancements in software engineering tasks.

Related Keywords

  • Claude Sonnet 4.5
  • Software Engineering AI
  • Anthropic AI Updates
  • Reasoning and Math Evaluation
  • Robotic Process Automation (RPA)
  • AI Code Generation
  • Cloud AI Services


Source link