ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images. We're setting a new standard in healthcare AI by providing a comprehensive, objective evaluation framework for cutting-edge models. Our mission is to accelerate progress in this critical field by fostering healthy competition and collaboration among researchers, clinicians, and AI enthusiasts. Using diverse datasets like MIMIC-CXR, IU-Xray, and CheXpert Plus, ReXrank offers a robust benchmarking system that evolves with clinical needs and technological advancements. Our leaderboard showcases top-performing models, driving innovation that could transform patient care and streamline medical workflows.
Join us in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation.
To evaluate your models, we made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input.
To run the evaluation, use
python evaluate.py <path_to_data> <path_to_predictions>
.
Once you have a built a model that works to your expectations on the MIMIC-CXR test set, you submit it to get official scores on our Private test set. Here's a tutorial on the submission for a smooth evaluation process.
Submission TutorialPlease cite if you find our leaderboard helpful.
To keep up to date with major changes to the leaderboard and dataset, please subscribe here !
Include top models for different datasets. * denotes model trained on this dataset.
Our private test dataset contains 10,000 studies collected from different medical centers in the US.
Rank | Model | RadCliQ-v1 ↓ | RadCliQ-v0 ↓ | BLEU ↑ | BertScore ↑ | SembScore ↑ | RadGraph ↑ |
---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
1.526 | 3.922 | 0.07 | 0.281 | 0.349 | 0.091 |
2 2024 |
CheXpertPlus_mimiccxr
Stanford |
1.248 | 3.52 | 0.177 | 0.364 | 0.431 | 0.139 |
3 2024 |
CheXpertPlus_chexpert
Stanford |
1.323 | 3.656 | 0.165 | 0.333 | 0.395 | 0.148 |
4 2024 |
CheXpertPlus_chexpert_mimiccxr
Stanford |
1.175 | 3.419 | 0.196 | 0.389 | 0.429 | 0.166 |
5 2023 |
Cvt2distilgpt2_mimiccxr
CSIRO |
1.284 | 3.592 | 0.163 | 0.318 | 0.44 | 0.163 |
6 2023 |
Cvt2distilgpt2_iuxray
CSIRO |
1.335 | 3.676 | 0.146 | 0.328 | 0.373 | 0.163 |
7 2024 |
MedVersa
Harvard |
1.016 | 3.123 | 0.172 | 0.438 | 0.48 | 0.188 |
8 2023 |
RadFM
SJTU |
1.356 | 3.683 | 0.132 | 0.338 | 0.375 | 0.131 |
9 2023 |
RaDialog
TUM |
1.268 | 3.54 | 0.148 | 0.342 | 0.437 | 0.147 |
10 2023 |
RGRG
TUM |
1.199 | 3.451 | 0.175 | 0.358 | 0.448 | 0.172 |
11 2023 |
VLCI_mimiccxr
SYSU |
1.501 | 3.94 | 0.14 | 0.257 | 0.384 | 0.115 |
12 2023 |
VLCI_iuxray
SYSU |
1.283 | 3.622 | 0.174 | 0.297 | 0.433 | 0.197 |
13 2024 |
LLM-CXR
KAIST |
2.369 | 5.388 | 0.032 | -0.082 | 0.201 | 0.013 |
14 2024 |
GPT4V
OpenAI |
2.316 | 5.318 | 0.055 | -0.065 | 0.208 | 0.028 |
15 2024 |
BiomedGPT_iu_xray
Lehigh University |
1.361 | 3.675 | 0.079 | 0.287 | 0.411 | 0.166 |
16 2024 |
BiomedGPT_peir_gross
Lehigh University |
1.911 | 4.562 | 0.015 | 0.127 | 0.269 | 0.049 |
MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments, and report the score on test set. * denotes the model was trained on MIMIC-CXR.
Rank | Model | RadCliQ-v1 ↓ | RadCliQ-v0 ↓ | BLEU ↑ | BertScore ↑ | SembScore ↑ | RadGraph ↑ |
---|---|---|---|---|---|---|---|
1 2024 |
CheXagent*
Stanford |
1.437 | 3.816 | 0.094 | 0.304 | 0.331 | 0.146 |
2 2024 |
CheXpertPlus_mimiccxr*
Stanford |
1.247 | 3.549 | 0.165 | 0.353 | 0.382 | 0.193 |
3 2024 |
CheXpertPlus_chexpert
Stanford |
1.399 | 3.787 | 0.127 | 0.3 | 0.342 | 0.173 |
4 2024 |
CheXpertPlus_chexpert_mimiccxr*
Stanford |
1.211 | 3.491 | 0.166 | 0.362 | 0.391 | 0.203 |
5 2023 |
Cvt2distilgpt2_mimiccxr*
CSIRO |
1.498 | 3.939 | 0.109 | 0.278 | 0.315 | 0.145 |
6 2023 |
Cvt2distilgpt2_iuxray
CSIRO |
1.752 | 4.331 | 0.042 | 0.246 | 0.168 | 0.1 |
7 2024 |
MedVersa*
Harvard |
1.088 | 3.337 | 0.193 | 0.43 | 0.315 | 0.273 |
8 2023 |
RadFM*
SJTU |
1.604 | 4.093 | 0.081 | 0.281 | 0.245 | 0.111 |
9 2023 |
RaDialog*
TUM |
1.33 | 3.647 | 0.112 | 0.322 | 0.381 | 0.168 |
10 2023 |
RGRG
TUM* |
1.363 | 3.723 | 0.125 | 0.323 | 0.337 | 0.176 |
11 2023 |
VLCI_mimiccxr*
SYSU |
1.573 | 4.081 | 0.124 | 0.252 | 0.293 | 0.136 |
12 2023 |
VLCI_iuxray
SYSU |
1.78 | 4.398 | 0.061 | 0.211 | 0.19 | 0.107 |
13 2024 |
LLM-CXR*
KAIST |
1.983 | 4.712 | 0.036 | 0.158 | 0.15 | 0.047 |
14 2024 |
GPT4V
OpenAI |
1.819 | 4.457 | 0.065 | 0.204 | 0.19 | 0.085 |
15 2024 |
BiomedGPT_iu_xray
Lehigh University |
1.9 | 4.554 | 0.015 | 0.163 | 0.205 | 0.062 |
16 2024 |
BiomedGPT_peir_gross
Lehigh University |
2.122 | 4.929 | 0.003 | 0.069 | 0.18 | 0.029 |
IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.
Rank | Model | RadCliQ-v1 ↓ | RadCliQ-v0 ↓ | BLEU ↑ | BertScore ↑ | SembScore ↑ | RadGraph ↑ |
---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
1.137 | 3.272 | 0.102 | 0.38 | 0.494 | 0.157 |
2 2024 |
CheXpertPlus_mimiccxr
Stanford |
0.884 | 2.916 | 0.227 | 0.449 | 0.594 | 0.187 |
3 2024 |
CheXpertPlus_chexpert
Stanford |
0.989 | 3.105 | 0.198 | 0.394 | 0.55 | 0.211 |
4 2024 |
CheXpertPlus_chexpert_mimiccxr
Stanford |
0.78 | 2.769 | 0.244 | 0.476 | 0.598 | 0.232 |
5 2023 |
Cvt2distilgpt2_mimiccxr
CSIRO |
0.956 | 3.029 | 0.192 | 0.39 | 0.605 | 0.2 |
6 2023 |
Cvt2distilgpt2_iuxray*
CSIRO |
0.861 | 2.931 | 0.222 | 0.43 | 0.543 | 0.273 |
7 2024 |
MedVersa
Harvard |
0.692 | 2.581 | 0.195 | 0.518 | 0.601 | 0.244 |
8 2023 |
RadFM
SJTU |
0.815 | 2.8 | 0.196 | 0.479 | 0.556 | 0.234 |
9 2023 |
RaDialog
TUM |
0.97 | 3.044 | 0.175 | 0.419 | 0.545 | 0.198 |
10 2023 |
RGRG
TUM |
0.803 | 2.818 | 0.24 | 0.447 | 0.603 | 0.248 |
11 2023 |
VLCI_mimiccxr
SYSU |
1.18 | 3.393 | 0.115 | 0.332 | 0.472 | 0.203 |
12 2023 |
VLCI_iuxray*
SYSU |
0.825 | 2.882 | 0.242 | 0.404 | 0.609 | 0.283 |
13 2024 |
LLM-CXR
KAIST |
2.072 | 4.863 | 0.035 | 0.175 | 0.061 | 0.023 |
14 2024 |
GPT4V
OpenAI |
1.462 | 3.856 | 0.079 | 0.235 | 0.403 | 0.16 |
15 2024 |
BiomedGPT_iu_xray*
Lehigh University |
1.044 | 3.175 | 0.123 | 0.361 | 0.512 | 0.242 |
16 2024 |
BiomedGPT_peir_gross
Lehigh University |
1.86 | 4.471 | 0.016 | 0.151 | 0.276 | 0.052 |
CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.
Rank | Model | RadCliQ-v1 ↓ | RadCliQ-v0 ↓ | BLEU ↑ | BertScore ↑ | SembScore ↑ | RadGraph ↑ |
---|---|---|---|---|---|---|---|
1 2024 |
CheXagent
Stanford |
2.213 | 5.147 | 0.077 | -0.056 | 0.284 | 0.039 |
2 2024 |
CheXpertPlus_mimiccxr
Stanford |
2.07 | 4.912 | 0.103 | 0.002 | 0.318 | 0.049 |
3 2024 |
CheXpertPlus_chexpert*
Stanford |
1.953 | 4.736 | 0.142 | 0.02 | 0.38 | 0.07 |
4 2024 |
CheXpertPlus_chexpert_mimiccxr*
Stanford |
1.958 | 4.745 | 0.14 | 0.011 | 0.388 | 0.071 |
5 2023 |
Cvt2distilgpt2_mimiccxr
CSIRO |
2.202 | 5.132 | 0.083 | -0.052 | 0.288 | 0.038 |
6 2023 |
Cvt2distilgpt2_iuxray
CSIRO |
2.352 | 5.396 | 0.061 | -0.064 | 0.16 | 0.035 |
7 2024 |
MedVersa
Harvard |
2.031 | 4.831 | 0.09 | 0.013 | 0.337 | 0.05 |
8 2023 |
RadFM
SJTU |
2.251 | 5.206 | 0.067 | -0.038 | 0.229 | 0.027 |
9 2023 |
RaDialog
TUM |
2.111 | 4.967 | 0.086 | -0.035 | 0.348 | 0.041 |
10 2023 |
RGRG
TUM |
2.194 | 5.139 | 0.103 | -0.039 | 0.263 | 0.047 |
11 2023 |
VLCI_mimiccxr
SYSU |
2.317 | 5.34 | 0.084 | -0.092 | 0.248 | 0.032 |
12 2023 |
VLCI_iuxray
SYSU |
2.407 | 5.496 | 0.065 | -0.098 | 0.165 | 0.032 |
13 2024 |
LLM-CXR
KAIST |
2.369 | 5.388 | 0.032 | -0.082 | 0.201 | 0.013 |
14 2024 |
GPT4V
OpenAI |
2.316 | 5.318 | 0.055 | -0.065 | 0.208 | 0.028 |
15 2024 |
BiomedGPT_iu_xray
Lehigh University |
2.366 | 5.378 | 0.02 | -0.079 | 0.193 | 0.018 |
16 2024 |
BiomedGPT_peir_gross
Lehigh University |
2.467 | 5.542 | 0.005 | -0.152 | 0.227 | 0.007 |