New study reveals bias in medical imaging AI models

MASSACHUSETTS, UNITED STATES — A recent study by the Massachusetts Institute of Technology (MIT) highlighted why artificial intelligence (AI) models used in medical imaging can exhibit bias.
These models, designed to predict a patient’s race, gender, and age, often use these traits as shortcuts when making medical diagnoses, leading to discrepancies in accuracy across different demographic groups.
Uncovering the “fairness gap” in AI diagnostics
AI models have become integral in analyzing medical images, such as X-rays, to assist in diagnoses. However, research has shown that these models do not perform equally well for all demographic groups, often underperforming for women and people of color.
Notably, a 2022 study by MIT researchers revealed that AI models could accurately predict a patient’s race from chest X-rays—an ability beyond the reach of even the most skilled radiologists.
The latest findings from the same research team indicate that the models most accurate at making demographic predictions also exhibit the largest “fairness gaps.”
These gaps refer to the discrepancies in the models’ diagnostic accuracy between different races and genders. The researchers suggest that the models may be using “demographic shortcuts,” leading to incorrect results for certain groups.
Challenges in debiasing AI models
“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” said Marzyeh Ghassemi, an MIT associate professor and senior author of the study.
The study also explored methods to retrain the models to improve fairness. The researchers found that debiasing techniques were most effective when the models were tested on the same types of patients they were trained on. However, when applied to patients from different hospitals, the fairness gaps reappeared.
The study’s findings underscore the importance of evaluating AI models on local patient data before deployment.
“You should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population,” advised Haoran Zhang, an MIT graduate student and lead author of the paper.
Future directions in AI fairness research
The researchers plan to develop and test additional methods to create models that can make fair predictions across diverse datasets.
This ongoing effort aims to ensure that AI models in medical imaging provide accurate and equitable results for all patient groups.