LLMs recommended more opioids for marginalized patients while flagging them as higher addiction risk.

AI-Rx - Your weekly dose of healthcare innovation

Estimated reading time: 4 minutes

TL;DR:

LLMs recommended more/stronger opioids for patients labeled as unhoused, Black, or LGBTQIA+
Same models flagged these groups as high addiction risk - contradictory clinical reasoning
Low-income/unemployed groups got elevated risk scores but fewer opioid recommendations
Standard pain management guidelines don't support demographic-based prescribing patterns

The study design:

Researchers tested 10 LLMs on 1,000 acute pain vignettes (half cancer, half non-cancer).

Each vignette appeared in 34 socio-demographic variations plus a control without demographic identifiers.

3.4 million model-generated responses analyzed for opioid recommendations, anxiety treatment, perceived psychological stress, risk scores, and monitoring recommendations.

The findings:

Across both cancer and non-cancer cases, patients labeled as unhoused, Black, or LGBTQIA+ received more or stronger opioid recommendations (sometimes exceeding 90% in cancer settings).

The same models flagged these groups as high addiction risk.

Low-income or unemployed groups received elevated risk scores but fewer opioid recommendations.

Inconsistent rationale across demographic groups for identical clinical presentations.

Here’s why this is unacceptable:

This isn't clinical variation. It's model-driven bias.

Standard pain management guidelines don't recommend more opioids for specific demographics while flagging those same demographics as higher risk.

That's contradictory reasoning no trained clinician would defend.

The models learned society's contradictory response to marginalized populations: "They're at risk for addiction, so let's prescribe them more opioids anyway."

The deployment question:

When AI encodes contradictory clinical reasoning at scale and deploys it across millions of patient interactions, what happens?

Disparities in anxiety treatment and perceived psychological stress similarly clustered within marginalized populations even when clinical details were identical.

These patterns diverge from evidence-based guidelines. They reflect biases in training data, not sound clinical practice.

My take:

Without rigorous bias evaluation and guideline-based checks, deploying these models amplifies existing healthcare disparities rather than reducing them.

Training on medical literature doesn't produce unbiased outputs. It encodes the biases already present in healthcare delivery and documentation.

Organizations deploying pain management AI need demographic bias testing across socioeconomic and identity categories before clinical use. Not after deployment when patients are already affected.

The technical solution exists: guideline-based guardrails, demographic fairness constraints, regular bias audits, human oversight on recommendations showing demographic variation.

The question is whether organizations will implement those safeguards before scaling deployment.

Dr. Bhargav Patel, MD, MBA

Physician-Innovator | AI in Healthcare | Child & Adolescent Psychiatrist

P.S. Does your clinical AI undergo demographic bias testing across socioeconomic and identity categories before deployment?

If not, you're assuming training data is unbiased (and this study proves it's not).

LLMs recommended more opioids for marginalized patients while flagging them as higher addiction risk.

AI-Rx - Your weekly dose of healthcare innovation

TL;DR:

Keep Reading

AI Dx - Your weekly diagnosis of healthcare AI and innovation