ReXrank

Open-Source Radiology Report Generation Leaderboard

What is ReXrank?

ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images. We're setting a new standard in healthcare AI by providing a comprehensive, objective evaluation framework for cutting-edge models. Our mission is to accelerate progress in this critical field by fostering healthy competition and collaboration among researchers, clinicians, and AI enthusiasts. Using diverse datasets like MIMIC-CXR, IU-Xray, and CheXpert Plus, ReXrank offers a robust benchmarking system that evolves with clinical needs and technological advancements. Our leaderboard showcases top-performing models, driving innovation that could transform patient care and streamline medical workflows.

Join us in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation.

Getting Started

To evaluate your models, we made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_data> <path_to_predictions> .

Once you have a built a model that works to your expectations on the MIMIC-CXR test set, you submit it to get official scores on our Private test set. Here's a tutorial on the submission for a smooth evaluation process.

Submission Tutorial

Please cite if you find our leaderboard helpful.

To keep up to date with major changes to the leaderboard and dataset, please subscribe here !

Leaderboard Overview

Include top models for different datasets. * denotes model trained on this dataset.

Rank Private MIMIC-CXR IU-Xray CheXpert Plus

1

MedVersa

Harvard

MedVersa*

Harvard

MedVersa

Harvard

CheXpertPlus_chexpert*

Stanford

2

CheXpertPlus_chexpert_mimiccxr

Stanford

CheXpertPlus_chexpert_mimiccxr*

Stanford

CheXpertPlus_chexpert_mimiccxr

Stanford

CheXpertPlus_chexpert_mimiccxr*

Stanford

3

RGRG

TUM

CheXpertPlus_mimiccxr*

Stanford

RGRG

TUM

MedVersa

Harvard

4

CheXpertPlus_mimiccxr

Stanford

RaDialog*

TUM

RadFM

SJTU

CheXpertPlus_mimiccxr

Stanford

5

RaDialog

TUM

RGRG

TUM*

VLCI_iuxray*

SYSU

RaDialog

TUM

6

VLCI_iuxray

SYSU

CheXpertPlus_chexpert

Stanford

Cvt2distilgpt2_iuxray*

CSIRO

RGRG

TUM

7

Cvt2distilgpt2_mimiccxr

CSIRO

CheXagent*

Stanford

CheXpertPlus_mimiccxr

Stanford

Cvt2distilgpt2_mimiccxr

CSIRO

8

CheXpertPlus_chexpert

Stanford

Cvt2distilgpt2_mimiccxr*

CSIRO

Cvt2distilgpt2_mimiccxr

CSIRO

CheXagent

Stanford

9

Cvt2distilgpt2_iuxray

CSIRO

VLCI_mimiccxr*

SYSU

RaDialog

TUM

RadFM

SJTU

10

RadFM

SJTU

RadFM*

SJTU

CheXpertPlus_chexpert

Stanford

GPT4V

OpenAI

11

BiomedGPT_iu_xray

Lehigh University

Cvt2distilgpt2_iuxray

CSIRO

BiomedGPT_iu_xray*

Lehigh University

VLCI_mimiccxr

SYSU

12

VLCI_mimiccxr

SYSU

VLCI_iuxray

SYSU

CheXagent

Stanford

Cvt2distilgpt2_iuxray

CSIRO

13

CheXagent

Stanford

GPT4V

OpenAI

VLCI_mimiccxr

SYSU

BiomedGPT_iu_xray

Lehigh University

14

BiomedGPT_peir_gross

Lehigh University

BiomedGPT_iu_xray

Lehigh University

GPT4V

OpenAI

LLM-CXR

KAIST

15

GPT4V

OpenAI

LLM-CXR*

KAIST

BiomedGPT_peir_gross

Lehigh University

VLCI_iuxray

SYSU

16

LLM-CXR

KAIST

BiomedGPT_peir_gross

Lehigh University

LLM-CXR

KAIST

BiomedGPT_peir_gross

Lehigh University

Leaderboard on Private Dataset

Our private test dataset contains 10,000 studies collected from different medical centers in the US.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent

Stanford

1.526 3.922 0.07 0.281 0.349 0.091

2

2024
CheXpertPlus_mimiccxr

Stanford

1.248 3.52 0.177 0.364 0.431 0.139

3

2024
CheXpertPlus_chexpert

Stanford

1.323 3.656 0.165 0.333 0.395 0.148

4

2024
CheXpertPlus_chexpert_mimiccxr

Stanford

1.175 3.419 0.196 0.389 0.429 0.166

5

2023
Cvt2distilgpt2_mimiccxr

CSIRO

1.284 3.592 0.163 0.318 0.44 0.163

6

2023
Cvt2distilgpt2_iuxray

CSIRO

1.335 3.676 0.146 0.328 0.373 0.163

7

2024
MedVersa

Harvard

1.016 3.123 0.172 0.438 0.48 0.188

8

2023
RadFM

SJTU

1.356 3.683 0.132 0.338 0.375 0.131

9

2023
RaDialog

TUM

1.268 3.54 0.148 0.342 0.437 0.147

10

2023
RGRG

TUM

1.199 3.451 0.175 0.358 0.448 0.172

11

2023
VLCI_mimiccxr

SYSU

1.501 3.94 0.14 0.257 0.384 0.115

12

2023
VLCI_iuxray

SYSU

1.283 3.622 0.174 0.297 0.433 0.197

13

2024
LLM-CXR

KAIST

2.369 5.388 0.032 -0.082 0.201 0.013

14

2024
GPT4V

OpenAI

2.316 5.318 0.055 -0.065 0.208 0.028

15

2024
BiomedGPT_iu_xray

Lehigh University

1.361 3.675 0.079 0.287 0.411 0.166

16

2024
BiomedGPT_peir_gross

Lehigh University

1.911 4.562 0.015 0.127 0.269 0.049

Leaderboard on MIMIC-CXR Dataset

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments, and report the score on test set. * denotes the model was trained on MIMIC-CXR.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent*

Stanford

1.437 3.816 0.094 0.304 0.331 0.146

2

2024
CheXpertPlus_mimiccxr*

Stanford

1.247 3.549 0.165 0.353 0.382 0.193

3

2024
CheXpertPlus_chexpert

Stanford

1.399 3.787 0.127 0.3 0.342 0.173

4

2024
CheXpertPlus_chexpert_mimiccxr*

Stanford

1.211 3.491 0.166 0.362 0.391 0.203

5

2023
Cvt2distilgpt2_mimiccxr*

CSIRO

1.498 3.939 0.109 0.278 0.315 0.145

6

2023
Cvt2distilgpt2_iuxray

CSIRO

1.752 4.331 0.042 0.246 0.168 0.1

7

2024
MedVersa*

Harvard

1.088 3.337 0.193 0.43 0.315 0.273

8

2023
RadFM*

SJTU

1.604 4.093 0.081 0.281 0.245 0.111

9

2023
RaDialog*

TUM

1.33 3.647 0.112 0.322 0.381 0.168

10

2023
RGRG

TUM*

1.363 3.723 0.125 0.323 0.337 0.176

11

2023
VLCI_mimiccxr*

SYSU

1.573 4.081 0.124 0.252 0.293 0.136

12

2023
VLCI_iuxray

SYSU

1.78 4.398 0.061 0.211 0.19 0.107

13

2024
LLM-CXR*

KAIST

1.983 4.712 0.036 0.158 0.15 0.047

14

2024
GPT4V

OpenAI

1.819 4.457 0.065 0.204 0.19 0.085

15

2024
BiomedGPT_iu_xray

Lehigh University

1.9 4.554 0.015 0.163 0.205 0.062

16

2024
BiomedGPT_peir_gross

Lehigh University

2.122 4.929 0.003 0.069 0.18 0.029

Leaderboard on IU Xray Dataset

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent

Stanford

1.137 3.272 0.102 0.38 0.494 0.157

2

2024
CheXpertPlus_mimiccxr

Stanford

0.884 2.916 0.227 0.449 0.594 0.187

3

2024
CheXpertPlus_chexpert

Stanford

0.989 3.105 0.198 0.394 0.55 0.211

4

2024
CheXpertPlus_chexpert_mimiccxr

Stanford

0.78 2.769 0.244 0.476 0.598 0.232

5

2023
Cvt2distilgpt2_mimiccxr

CSIRO

0.956 3.029 0.192 0.39 0.605 0.2

6

2023
Cvt2distilgpt2_iuxray*

CSIRO

0.861 2.931 0.222 0.43 0.543 0.273

7

2024
MedVersa

Harvard

0.692 2.581 0.195 0.518 0.601 0.244

8

2023
RadFM

SJTU

0.815 2.8 0.196 0.479 0.556 0.234

9

2023
RaDialog

TUM

0.97 3.044 0.175 0.419 0.545 0.198

10

2023
RGRG

TUM

0.803 2.818 0.24 0.447 0.603 0.248

11

2023
VLCI_mimiccxr

SYSU

1.18 3.393 0.115 0.332 0.472 0.203

12

2023
VLCI_iuxray*

SYSU

0.825 2.882 0.242 0.404 0.609 0.283

13

2024
LLM-CXR

KAIST

2.072 4.863 0.035 0.175 0.061 0.023

14

2024
GPT4V

OpenAI

1.462 3.856 0.079 0.235 0.403 0.16

15

2024
BiomedGPT_iu_xray*

Lehigh University

1.044 3.175 0.123 0.361 0.512 0.242

16

2024
BiomedGPT_peir_gross

Lehigh University

1.86 4.471 0.016 0.151 0.276 0.052

Leaderboard on CheXpert Plus Dataset

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent

Stanford

2.213 5.147 0.077 -0.056 0.284 0.039

2

2024
CheXpertPlus_mimiccxr

Stanford

2.07 4.912 0.103 0.002 0.318 0.049

3

2024
CheXpertPlus_chexpert*

Stanford

1.953 4.736 0.142 0.02 0.38 0.07

4

2024
CheXpertPlus_chexpert_mimiccxr*

Stanford

1.958 4.745 0.14 0.011 0.388 0.071

5

2023
Cvt2distilgpt2_mimiccxr

CSIRO

2.202 5.132 0.083 -0.052 0.288 0.038

6

2023
Cvt2distilgpt2_iuxray

CSIRO

2.352 5.396 0.061 -0.064 0.16 0.035

7

2024
MedVersa

Harvard

2.031 4.831 0.09 0.013 0.337 0.05

8

2023
RadFM

SJTU

2.251 5.206 0.067 -0.038 0.229 0.027

9

2023
RaDialog

TUM

2.111 4.967 0.086 -0.035 0.348 0.041

10

2023
RGRG

TUM

2.194 5.139 0.103 -0.039 0.263 0.047

11

2023
VLCI_mimiccxr

SYSU

2.317 5.34 0.084 -0.092 0.248 0.032

12

2023
VLCI_iuxray

SYSU

2.407 5.496 0.065 -0.098 0.165 0.032

13

2024
LLM-CXR

KAIST

2.369 5.388 0.032 -0.082 0.201 0.013

14

2024
GPT4V

OpenAI

2.316 5.318 0.055 -0.065 0.208 0.028

15

2024
BiomedGPT_iu_xray

Lehigh University

2.366 5.378 0.02 -0.079 0.193 0.018

16

2024
BiomedGPT_peir_gross

Lehigh University

2.467 5.542 0.005 -0.152 0.227 0.007