Signal & Seam
Analysis

Computer use is becoming the real AI product race

Anthropic's Vercept acquisition and its Firefox security work with Mozilla point to the same conclusion: the next serious AI battle is not prettier chat, but systems that can reliably perceive, navigate, and act inside software.

For a while, "AI agents" mostly meant demos: a model clicking around a browser, fumbling through a form, then getting celebrated for not breaking anything too badly.

That phase is ending.

Two recent Anthropic moves make the shift visible. First, the company acquired Vercept, a startup built around the idea that useful agents depend on solving perception and interaction, not just language generation. Second, Anthropic published a security collaboration with Mozilla showing that Claude Opus 4.6 helped uncover 22 Firefox vulnerabilities in two weeks, 14 of them high severity by Mozilla's count.

Those are different stories on the surface. One is M&A. The other is a red-team case study. But they point to the same underlying reality: the economically important part of the agent market is moving from chat quality to software competence.

The point people keep missing

The strongest AI companies are not just trying to make models sound smarter. They are trying to make them *operationally useful* inside the mess of real software.

That means a system has to do at least four things well:

That is a much harder product problem than question answering. It is also a much more valuable one.

If an AI can summarize an article, you have a feature. If it can safely move through a browser, a codebase, an email client, an internal dashboard, or a desktop workflow, you have the beginning of a labor model.

Why the Vercept deal matters

Anthropic's announcement is unusually clear about what it thinks is scarce. Vercept, it says, was built around the thesis that making AI useful for complex tasks requires solving hard perception and interaction problems. That is the right framing.

Plenty of AI coverage still talks as if better agents will arrive automatically once models get generally smarter. But the Vercept acquisition suggests something more concrete: companies now believe computer use is a distinct capability stack worth buying for.

Anthropic also attached a useful performance signal to the announcement. It said its Sonnet models improved on the OSWorld benchmark from under 15% in late 2024 to 35.2% on OSWorld-Verified in early 2026. That number does not mean computer-use agents are solved. Quite the opposite. It implies two things at once:

1. progress is real 2. the gap between "impressive demo" and "reliable worker" is still large

That is exactly why this category matters. A low but rising number in a hard environment is often more important than a high number on an easy benchmark.

The Mozilla result is more important than the press-release version

The more revealing story is not the acquisition. It is the Firefox work.

Anthropic says Claude Opus 4.6 discovered 22 vulnerabilities during a two-week collaboration with Mozilla, and Mozilla's own write-up is even more useful because it explains why the effort mattered operationally. Mozilla says the reports came with minimal test cases that made them easy to verify and reproduce, and that fixes began landing within hours. Mozilla also says the collaboration ultimately produced 22 CVEs and 14 high-severity bugs, along with 90 other bugs, most of which are now fixed.

That matters because the bottleneck in practical agent systems is rarely raw idea generation. It is whether the system can produce work in a form that another organization can actually use.

A model that says "I think there may be a problem somewhere around here" is interesting. A system that finds a bug, reduces it to a reproducible case, and hands a human team something actionable is crossing into real economic value.

This is the pattern to watch across the industry. The winning systems will not just produce plausible outputs. They will produce outputs that slot cleanly into existing workflows: tickets, patches, test cases, transaction steps, audit trails, and human checkpoints.

Computer use is not just a model layer

Google's Project Mariner makes the broader industry direction hard to ignore. DeepMind describes it as a system that can interpret goals, plan actionable steps, navigate websites, and keep the user informed while allowing takeover at any time. The examples are telling: finding job listings from a resume, moving from an email order to Taskrabbit, or turning a recipe in Drive into an Instacart cart.

Those are not "ask me anything" scenarios. They are workflow scenarios.

That distinction matters for business strategy. Once the product is a workflow rather than an answer, the stack becomes more complicated and more defensible. The company needs:

That is why I think the market is slowly misnaming what is happening. "Agent" sounds like a thin wrapper around a model. In practice, the valuable thing is becoming a reliability stack for action.

This is why chat UX is no longer the whole story

OpenAI's Operator launch earlier this year, Google's Project Mariner work, and Anthropic's recent moves all point in roughly the same direction: labs are competing to own the layer where AI stops being a conversational novelty and starts behaving like software that can operate other software.

That does not mean fully autonomous agents are ready for broad trust. They are not. The Mozilla case is powerful precisely because it happened in a constrained, collaborative setup with expert humans in the loop. The same is true of most credible computer-use progress today: narrow domain, explicit oversight, measurable task structure.

But that should not be read as weakness. It should be read as product reality.

The near-term winners in AI are unlikely to be the companies that promise magical autonomy everywhere. More likely, they will be the ones that get three things right:

In other words: not "the AI does everything," but "the AI can move through expensive software work without constantly needing rescue."

The business angle

This is where the story gets more interesting than model rankings.

Once computer use works well enough, it changes the shape of software value. Interfaces that were built for patient human operators start becoming substrates for machine operators. Enterprise software stops being just a tool your employees use and starts becoming a work environment your agents must survive.

That has consequences:

If that sounds less like consumer AI and more like infrastructure, good. That is the point.

The easy version of the AI market is a race to answer more questions. The harder and more valuable version is a race to do more work.

Right now, Anthropic's Vercept deal and Mozilla collaboration look like evidence that the serious players know which race they are in.

Sources

Primary

Secondary