AI-Rx - Your weekly dose of healthcare innovation
Estimated reading time: 4 minutes
TL;DR
Formal AI literacy training doesn't prevent automation bias. When physicians see confident AI recommendations, they anchor to them… even when trained to resist.
The solution isn't better education. It's a better system design.
The Study
44 physicians completed 20 hours of formal AI literacy training.
They learned how LLMs work. How to spot errors. How to maintain independent judgment.
Then they were tested on 6 diagnostic cases.
Half received error-free suggestions from ChatGPT. Half received suggestions with deliberately inserted errors on 3 of 6 cases.
The physicians could voluntarily consult the AI. They could accept, modify, or reject any suggestion.
All the structural protections we're supposed to put in place.

What Happened
Physicians exposed to erroneous AI recommendations had a mean diagnostic accuracy of 73.3%.
Control group: 84.9%.
That's a 14-percentage-point reduction from AI exposure.
Top-choice diagnosis accuracy was even worse: 18.3 percentage points lower in the AI-assisted group.
Even with AI literacy training.
Even with voluntary consultation.
Even with full autonomy to reject recommendations.

Why This Matters
This isn't an education problem. It's a neurobiology problem.
Automation bias is a cognitive phenomenon deeper than knowledge. Your brain wants to trust the machine once you've been told it's trustworthy.
Telling physicians "verify everything" doesn't overcome that.
Training teaches you what to do. Automation bias operates at the level of whether you do it.
What We've Been Assuming (Wrong)
"If we train physicians to use AI critically, they'll catch errors."
The data says otherwise.
Trained physicians still demonstrate substantial automation bias.
The gap between what we know and what we do under cognitive load is real.

The Real Solution
You can't solve automation bias with training alone.
You need system design that makes verification mandatory, not voluntary.
Workflow friction that forces independent evaluation before AI recommendations reach clinical decisions.
Governance that assumes automation bias will happen and builds protections accordingly.
Examples:
→ AI recommendations don't populate the default action; clinician must actively accept them
→ AI recommendations are hidden until the clinician has documented their own thinking
→ High-risk recommendations (medication changes, critical actions) require explicit override documentation
→ Audit trails track when clinicians followed AI vs. rejected it, with feedback loops
The Deployment Question
If physicians with AI literacy training still demonstrate 14-18 percentage point accuracy reductions when exposed to erroneous AI recommendations…
Are we ready to deploy AI in clinical environments where errors can happen?
Or do we need to redesign our workflows first?
Talk soon,
Bhargav
P.S. This connects directly to the evidence infrastructure questions in The Future of AI in Healthcare, how do we build systems where human judgment and AI capability work together by design, not by hope?