Peer-reviewed veterinary case report
Assessing the generalizability of artificial intelligence in radiology: a systematic review of performance across different clinical settings.
- Year:
- 2025
- Authors:
- Suleman MU et al.
- Affiliation:
- Department of Internal Medicine
Abstract
<h4>Introduction</h4>Artificial intelligence (AI) applications in diagnostic radiology have demonstrated remarkable accuracy on institutional datasets. However, concerns about external generalizability performance when models encounter data from different hospitals, scanners, or patient populations remain a major barrier to clinical deployment.<h4>Methods</h4>We performed a Preferred Reporting Items for Systematic Reviews and Meta-Analyses and Assessing the Methodological Quality of Systematic Reviews -compliant systematic review of peer-reviewed studies (January 2022 - June 2025) reporting both internal and external validation of AI diagnostic models applied to computed tomography, magnetic resonance imaging, or X-rays. The review was registered in the PROSPERO database. PubMed and Embase searches identified 342 records; after de-duplication, screening, and eligibility assessment, six studies met our inclusion criteria. These studies addressed diverse diagnostic tasks using deep learning architectures (3D Convolutional Neural Networks, Generative Adversarial Networks-augmented models, no-new-U-Net ensembles, and regulatory-cleared systems).<h4>Results</h4>Internal-validation area under the curve (AUC) ranged from 0.76 to 0.95; sensitivities were generally >85% and specificities >68%. In external validation, performance declined modestly in AUC (median drop ~0.03), with larger decreases in specificity (up to ~24 percentage points). Quality Assessment of Diagnostic Accuracy Studies - Version 2 assessment revealed low overall risk of bias in five studies; one study had high patient-selection bias, and another had unclear sampling. Methods that appeared to enhance generalizability included multicenter training, data augmentation with generative adversarial networks, and incorporation of clinical variables.<h4>Conclusion</h4>AI models in radiology tend to underperform on external data despite strong internal performance. Mandatory external validation on diverse cohorts and cautious clinical integration are recommended.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://europepmc.org/article/MED/41377327