AI-Rx - Your weekly dose of healthcare innovation
Estimated reading time: 4 minutes
TL;DR:
LLMs recommended more/stronger opioids for patients labeled as unhoused, Black, or LGBTQIA+
Same models flagged these groups as high addiction risk - contradictory clinical reasoning
Low-income/unemployed groups got elevated risk scores but fewer opioid recommendations
Standard pain management guidelines don't support demographic-based prescribing patterns
The study design:
Researchers tested 10 LLMs on 1,000 acute pain vignettes (half cancer, half non-cancer).
Each vignette appeared in 34 socio-demographic variations plus a control without demographic identifiers.
3.4 million model-generated responses analyzed for opioid recommendations, anxiety treatment, perceived psychological stress, risk scores, and monitoring recommendations.

The findings:
Across both cancer and non-cancer cases, patients labeled as unhoused, Black, or LGBTQIA+ received more or stronger opioid recommendations (sometimes exceeding 90% in cancer settings).
The same models flagged these groups as high addiction risk.
Low-income or unemployed groups received elevated risk scores but fewer opioid recommendations.
Inconsistent rationale across demographic groups for identical clinical presentations.

Here’s why this is unacceptable:
This isn't clinical variation. It's model-driven bias.
Standard pain management guidelines don't recommend more opioids for specific demographics while flagging those same demographics as higher risk.
That's contradictory reasoning no trained clinician would defend.
The models learned society's contradictory response to marginalized populations: "They're at risk for addiction, so let's prescribe them more opioids anyway."
The deployment question:
When AI encodes contradictory clinical reasoning at scale and deploys it across millions of patient interactions, what happens?
Disparities in anxiety treatment and perceived psychological stress similarly clustered within marginalized populations even when clinical details were identical.
These patterns diverge from evidence-based guidelines. They reflect biases in training data, not sound clinical practice.
My take:
Without rigorous bias evaluation and guideline-based checks, deploying these models amplifies existing healthcare disparities rather than reducing them.
Training on medical literature doesn't produce unbiased outputs. It encodes the biases already present in healthcare delivery and documentation.
Organizations deploying pain management AI need demographic bias testing across socioeconomic and identity categories before clinical use. Not after deployment when patients are already affected.
The technical solution exists: guideline-based guardrails, demographic fairness constraints, regular bias audits, human oversight on recommendations showing demographic variation.
The question is whether organizations will implement those safeguards before scaling deployment.
Physician-Innovator | AI in Healthcare | Child & Adolescent Psychiatrist
P.S. Does your clinical AI undergo demographic bias testing across socioeconomic and identity categories before deployment?
If not, you're assuming training data is unbiased (and this study proves it's not).