The AV Safety Debate Is Missing Its Own Reference Point

Jon Miller

May 25, 2026

Insurance

What “human” actually means?

The autonomous vehicle industry is locked in a debate about safety. Regulators want proof. Insurers want data. The public wants reassurance. And AV developers are publishing report after report claiming their systems are safer than human drivers.

There is just one problem: Nobody has agreed on what “safer than human” actually means.

As of today, that gap now has an answer. Lockton, the world’s largest privately held insurance brokerage, and Nexar have built a framework that benchmarks autonomous vehicle performance against real-world human driving, giving insurers, regulators, and developers a common reference point the market has never had. How it works matters. So does why a problem this fundamental went unsolved for so long.

For decades, road safety has been measured against human performance: Fatalities per billion miles, incident rates by geography, and near-miss frequency by weather and time of day, just to name a few. Humans are the baseline. Every traffic law, every road design standard, and every insurance model was built around that assumption.

When autonomous vehicles arrived, the question became obvious: is the machine safer than the person? But the answer has been difficult to verify, not because the technology is opaque, but because the human baseline itself was never codified in a way that made AV comparison credible.

In practice, comparing AV performance to human driving is not as straightforward as it sounds. Many existing evaluation approaches rely on datasets and methodologies that were not originally designed to serve as a consistent human benchmark. As a result, incidents may be interpreted differently depending on the context, and comparisons are often made across varying conditions and definitions of performance.

Individual AV developers publish safety cases built on top of these approaches. Many are rigorous, but they are often difficult to compare directly because they rely on different assumptions, environments, and proxies for human behavior. That is a structural problem, not a data problem.

The standard insurance response to uncertainty is to wait. Accumulate loss experience. Build actuarial credibility. Price accordingly. That approach made sense before AV technology began entering commercial fleets, logistics networks, and urban transport systems. It is becoming less defensible.

Even insurers with no intention of underwriting AV developers directly are accumulating exposure through their portfolios. A logistics company deploying autonomous vehicles. A fleet operator integrating assisted driving systems. A property owner in an area where robotaxis operate. The question is no longer whether an insurer is “in” the AV market. Increasingly, the AV market is in them.

There is nothing cautious about carrying a risk you haven't measured. The exposure is already on the books. The only question is whether anyone has looked at it.

This leads to a more fundamental issue in how the conversation itself has been framed. Much of the discussion today focuses on AV versus AV. Which company is safest? Whose miles are most comparable? That framing misses the point.The question the market needs answered first is simpler: does this AV system perform at, above, or below the level of a human driver in the same conditions?

While human driving is an imperfect baseline, it remains the only universally accepted reference point the industry has. If that question can be answered credibly, insurers can price it, regulators can reference it, and AV companies can communicate performance in a way the market understands. That is the missing piece the market has been operating without.

‍

Here’s how the framework works

The framework is grounded in BADAS 2.0, Nexar's collision anticipation model family. The commercial variant was trained on 10 billion miles and 60 million safety-critical events. Nexar’s network of more than 350,000 cameras, covering 94% of U.S. roads, provides the scale that makes the benchmark meaningful rather than anecdotal.

It works in two parts.

The first is the Nexar Risk Index, an environmental risk assessment that evaluates the operating domain, accounting for geography, road conditions, and driving profile. Before any comparison is made, the context is established. An AV operating in dense urban traffic is not being measured against the same baseline as one running highway logistics in a low-complexity environment.

The second is Nexar Apex, a submission platform where AV developers test their models against curated real-world edge cases drawn from comparable environments. The output is direct: performance relative to a human baseline in defined conditions.

What sets this apart is how it is built. The benchmark rests on observed human behavior, not assumed proxies. It reflects how people drive, not how they are expected to behave in theory. And it is domain-specific, so comparisons happen within relevant operating environments rather than across generalized scenarios.

It also clears one of the industry’s long-standing barriers. AV developers do not have to expose proprietary data to one another. Performance is measured against a common reference point, not against competing systems. And it was built for use, not only for evaluation. The output is structured to inform underwriting decisions, not just to satisfy technical review or regulatory reporting.

The impact of a credible human benchmark extends across the market. For AV developers, it provides a way to demonstrate safety performance in a format insurers and regulators can use. Insurers start to close the gap between deployment and measurable risk. Regulators gain a reference grounded in observed behavior rather than self-reported metrics. And the conversation shifts from AV versus AV to AV versus human, which is how road safety has always been understood.

Consider how this plays out. An AV developer operating in a city has internal data, simulations, and real confidence in its system. But when it approaches an insurer, there is often no independent way to validate the claim, no contextual framework, and no shared reference point. Under this framework, that developer submits its model for evaluation against real-world edge cases from comparable environments. The results show where the system exceeds, matches, or falls below human performance, with environmental context alongside. The insurer has something to price. The developer has something to prove. The conversation moves from assertion to evidence.

The uncomfortable, but necessary question

This raises a harder question. The industry has invested heavily in publishing safety statistics, and much of that work is genuinely strong. But a statistic is only as credible as the baseline behind it. When the baseline is inconsistently defined, even good data is hard to trust and harder to compare. A shared human benchmark fixes that, and it raises the standard for everyone, which is exactly what a maturing market should want.

A shared benchmark does more than validate strong safety claims. It gives every claim a common reference to be measured against, which is what turns a field of competing assertions into a market that can actually compare them. That is how credibility compounds over time.

This framework is not a finished system. As deployment expands, the dataset improves. Edge cases deepen. Comparisons become more precise and more useful to stakeholders across insurance, regulation, and technology.

What matters now is that a reference point exists, grounded in real human behavior, built on scalable data, and aligned with a question the market has needed to answer for years.

If you build autonomous systems, here is the practical next step.

Submit your model to Nexar Apex and see exactly where it performs at, above, or below the human baseline in the environments you operate in. The benchmark is no longer theoretical. It is something you can test against today.

The reference point exists now. The real question is what the industry does with it.

‍

Table of contents

What “human” actually means?

Here’s how the framework works

The uncomfortable, but necessary question

The AV Safety Debate Is Missing Its Own Reference Point

What “human” actually means?

Here’s how the framework works

The uncomfortable, but necessary question

Want to dive deeper?

What to read next

The Real-World Edge for AI. Built for Safety and Autonomy.

Nexar APEX: The "Driver's License" for Autonomous Vehicles

Introducing BADAS 1.0 Beyond ADAS