Training didn't stop it: Why physicians still trust flawed AI

AI-Rx - Your weekly dose of healthcare innovation

Estimated reading time: 4 minutes

TL;DR

Formal AI literacy training doesn't prevent automation bias. When physicians see confident AI recommendations, they anchor to them… even when trained to resist.

The solution isn't better education. It's a better system design.

The Study

44 physicians completed 20 hours of formal AI literacy training.

They learned how LLMs work. How to spot errors. How to maintain independent judgment.

Then they were tested on 6 diagnostic cases.

Half received error-free suggestions from ChatGPT. Half received suggestions with deliberately inserted errors on 3 of 6 cases.

The physicians could voluntarily consult the AI. They could accept, modify, or reject any suggestion.

All the structural protections we're supposed to put in place.

What Happened

Physicians exposed to erroneous AI recommendations had a mean diagnostic accuracy of 73.3%.

Control group: 84.9%.

That's a 14-percentage-point reduction from AI exposure.

Top-choice diagnosis accuracy was even worse: 18.3 percentage points lower in the AI-assisted group.

Even with AI literacy training.

Even with voluntary consultation.

Even with full autonomy to reject recommendations.

Why This Matters

This isn't an education problem. It's a neurobiology problem.

Automation bias is a cognitive phenomenon deeper than knowledge. Your brain wants to trust the machine once you've been told it's trustworthy.

Telling physicians "verify everything" doesn't overcome that.

Training teaches you what to do. Automation bias operates at the level of whether you do it.

What We've Been Assuming (Wrong)

"If we train physicians to use AI critically, they'll catch errors."

The data says otherwise.

Trained physicians still demonstrate substantial automation bias.

The gap between what we know and what we do under cognitive load is real.

The Real Solution

You can't solve automation bias with training alone.

You need system design that makes verification mandatory, not voluntary.

Workflow friction that forces independent evaluation before AI recommendations reach clinical decisions.

Governance that assumes automation bias will happen and builds protections accordingly.

Examples:

→ AI recommendations don't populate the default action; clinician must actively accept them

→ AI recommendations are hidden until the clinician has documented their own thinking

→ High-risk recommendations (medication changes, critical actions) require explicit override documentation

→ Audit trails track when clinicians followed AI vs. rejected it, with feedback loops

The Deployment Question

If physicians with AI literacy training still demonstrate 14-18 percentage point accuracy reductions when exposed to erroneous AI recommendations…

Are we ready to deploy AI in clinical environments where errors can happen?

Or do we need to redesign our workflows first?

Talk soon,

Bhargav

P.S. This connects directly to the evidence infrastructure questions in The Future of AI in Healthcare, how do we build systems where human judgment and AI capability work together by design, not by hope?

Follow on LinkedIn