ReXrank

What is ReXrank?

ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images. We're setting a new standard in healthcare AI by providing a comprehensive, objective evaluation framework for cutting-edge models. Our mission is to accelerate progress in this critical field by fostering healthy competition and collaboration among researchers, clinicians, and AI enthusiasts. Using diverse datasets like MIMIC-CXR, IU-Xray, and CheXpert Plus, ReXrank offers a robust benchmarking system that evolves with clinical needs and technological advancements. Our leaderboard showcases top-performing models, driving innovation that could transform patient care and streamline medical workflows.

Join us in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation.

Getting Started

To evaluate your models, we made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_data> <path_to_predictions>.

Once you have a built a model that works to your expectations on the MIMIC-CXR test set, you submit it to get official scores on our Private test set. Here's a tutorial on the submission for a smooth evaluation process.

Submission Tutorial

Please cite if you find our leaderboard helpful.

To keep up to date with major changes to the leaderboard and dataset, please subscribe here !

Leaderboard Overview

Include top models for different datasets. * denotes model trained on this dataset.

Rank	Private	MIMIC-CXR	IU-Xray	CheXpert Plus
1	MedVersa Harvard	MedVersa* Harvard	MedVersa Harvard	CheXpertPlus_chexpert* Stanford
2	CheXpertPlus_chexpert_mimiccxr Stanford	CheXpertPlus_chexpert_mimiccxr* Stanford	CheXpertPlus_chexpert_mimiccxr Stanford	CheXpertPlus_chexpert_mimiccxr* Stanford
3	RGRG TUM	CheXpertPlus_mimiccxr* Stanford	RGRG TUM	MedVersa Harvard
4	CheXpertPlus_mimiccxr Stanford	RaDialog* TUM	RadFM SJTU	CheXpertPlus_mimiccxr Stanford
5	RaDialog TUM	RGRG TUM*	VLCI_iuxray* SYSU	RaDialog TUM
6	VLCI_iuxray SYSU	CheXpertPlus_chexpert Stanford	Cvt2distilgpt2_iuxray* CSIRO	RGRG TUM
7	Cvt2distilgpt2_mimiccxr CSIRO	CheXagent* Stanford	CheXpertPlus_mimiccxr Stanford	Cvt2distilgpt2_mimiccxr CSIRO
8	CheXpertPlus_chexpert Stanford	Cvt2distilgpt2_mimiccxr* CSIRO	Cvt2distilgpt2_mimiccxr CSIRO	CheXagent Stanford
9	Cvt2distilgpt2_iuxray CSIRO	VLCI_mimiccxr* SYSU	RaDialog TUM	RadFM SJTU
10	RadFM SJTU	RadFM* SJTU	CheXpertPlus_chexpert Stanford	GPT4V OpenAI
11	BiomedGPT_iu_xray Lehigh University	Cvt2distilgpt2_iuxray CSIRO	BiomedGPT_iu_xray* Lehigh University	VLCI_mimiccxr SYSU
12	VLCI_mimiccxr SYSU	VLCI_iuxray SYSU	CheXagent Stanford	Cvt2distilgpt2_iuxray CSIRO
13	CheXagent Stanford	GPT4V OpenAI	VLCI_mimiccxr SYSU	BiomedGPT_iu_xray Lehigh University
14	BiomedGPT_peir_gross Lehigh University	BiomedGPT_iu_xray Lehigh University	GPT4V OpenAI	LLM-CXR KAIST
15	GPT4V OpenAI	LLM-CXR* KAIST	BiomedGPT_peir_gross Lehigh University	VLCI_iuxray SYSU
16	LLM-CXR KAIST	BiomedGPT_peir_gross Lehigh University	LLM-CXR KAIST	BiomedGPT_peir_gross Lehigh University

Leaderboard on Private Dataset

Our private test dataset contains 10,000 studies collected from different medical centers in the US.

Rank	Model	RadCliQ-v1 ↓	RadCliQ-v0 ↓	BLEU ↑	BertScore ↑	SembScore ↑	RadGraph ↑
1 2024	CheXagent Stanford	1.526	3.922	0.07	0.281	0.349	0.091
2 2024	CheXpertPlus_mimiccxr Stanford	1.248	3.52	0.177	0.364	0.431	0.139
3 2024	CheXpertPlus_chexpert Stanford	1.323	3.656	0.165	0.333	0.395	0.148
4 2024	CheXpertPlus_chexpert_mimiccxr Stanford	1.175	3.419	0.196	0.389	0.429	0.166
5 2023	Cvt2distilgpt2_mimiccxr CSIRO	1.284	3.592	0.163	0.318	0.44	0.163
6 2023	Cvt2distilgpt2_iuxray CSIRO	1.335	3.676	0.146	0.328	0.373	0.163
7 2024	MedVersa Harvard	1.016	3.123	0.172	0.438	0.48	0.188
8 2023	RadFM SJTU	1.356	3.683	0.132	0.338	0.375	0.131
9 2023	RaDialog TUM	1.268	3.54	0.148	0.342	0.437	0.147
10 2023	RGRG TUM	1.199	3.451	0.175	0.358	0.448	0.172
11 2023	VLCI_mimiccxr SYSU	1.501	3.94	0.14	0.257	0.384	0.115
12 2023	VLCI_iuxray SYSU	1.283	3.622	0.174	0.297	0.433	0.197
13 2024	LLM-CXR KAIST	2.369	5.388	0.032	-0.082	0.201	0.013
14 2024	GPT4V OpenAI	2.316	5.318	0.055	-0.065	0.208	0.028
15 2024	BiomedGPT_iu_xray Lehigh University	1.361	3.675	0.079	0.287	0.411	0.166
16 2024	BiomedGPT_peir_gross Lehigh University	1.911	4.562	0.015	0.127	0.269	0.049

Leaderboard on MIMIC-CXR Dataset

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments, and report the score on test set. * denotes the model was trained on MIMIC-CXR.

Rank	Model	RadCliQ-v1 ↓	RadCliQ-v0 ↓	BLEU ↑	BertScore ↑	SembScore ↑	RadGraph ↑
1 2024	CheXagent* Stanford	1.437	3.816	0.094	0.304	0.331	0.146
2 2024	CheXpertPlus_mimiccxr* Stanford	1.247	3.549	0.165	0.353	0.382	0.193
3 2024	CheXpertPlus_chexpert Stanford	1.399	3.787	0.127	0.3	0.342	0.173
4 2024	CheXpertPlus_chexpert_mimiccxr* Stanford	1.211	3.491	0.166	0.362	0.391	0.203
5 2023	Cvt2distilgpt2_mimiccxr* CSIRO	1.498	3.939	0.109	0.278	0.315	0.145
6 2023	Cvt2distilgpt2_iuxray CSIRO	1.752	4.331	0.042	0.246	0.168	0.1
7 2024	MedVersa* Harvard	1.088	3.337	0.193	0.43	0.315	0.273
8 2023	RadFM* SJTU	1.604	4.093	0.081	0.281	0.245	0.111
9 2023	RaDialog* TUM	1.33	3.647	0.112	0.322	0.381	0.168
10 2023	RGRG TUM*	1.363	3.723	0.125	0.323	0.337	0.176
11 2023	VLCI_mimiccxr* SYSU	1.573	4.081	0.124	0.252	0.293	0.136
12 2023	VLCI_iuxray SYSU	1.78	4.398	0.061	0.211	0.19	0.107
13 2024	LLM-CXR* KAIST	1.983	4.712	0.036	0.158	0.15	0.047
14 2024	GPT4V OpenAI	1.819	4.457	0.065	0.204	0.19	0.085
15 2024	BiomedGPT_iu_xray Lehigh University	1.9	4.554	0.015	0.163	0.205	0.062
16 2024	BiomedGPT_peir_gross Lehigh University	2.122	4.929	0.003	0.069	0.18	0.029

Leaderboard on IU Xray Dataset

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.

Rank	Model	RadCliQ-v1 ↓	RadCliQ-v0 ↓	BLEU ↑	BertScore ↑	SembScore ↑	RadGraph ↑
1 2024	CheXagent Stanford	1.137	3.272	0.102	0.38	0.494	0.157
2 2024	CheXpertPlus_mimiccxr Stanford	0.884	2.916	0.227	0.449	0.594	0.187
3 2024	CheXpertPlus_chexpert Stanford	0.989	3.105	0.198	0.394	0.55	0.211
4 2024	CheXpertPlus_chexpert_mimiccxr Stanford	0.78	2.769	0.244	0.476	0.598	0.232
5 2023	Cvt2distilgpt2_mimiccxr CSIRO	0.956	3.029	0.192	0.39	0.605	0.2
6 2023	Cvt2distilgpt2_iuxray* CSIRO	0.861	2.931	0.222	0.43	0.543	0.273
7 2024	MedVersa Harvard	0.692	2.581	0.195	0.518	0.601	0.244
8 2023	RadFM SJTU	0.815	2.8	0.196	0.479	0.556	0.234
9 2023	RaDialog TUM	0.97	3.044	0.175	0.419	0.545	0.198
10 2023	RGRG TUM	0.803	2.818	0.24	0.447	0.603	0.248
11 2023	VLCI_mimiccxr SYSU	1.18	3.393	0.115	0.332	0.472	0.203
12 2023	VLCI_iuxray* SYSU	0.825	2.882	0.242	0.404	0.609	0.283
13 2024	LLM-CXR KAIST	2.072	4.863	0.035	0.175	0.061	0.023
14 2024	GPT4V OpenAI	1.462	3.856	0.079	0.235	0.403	0.16
15 2024	BiomedGPT_iu_xray* Lehigh University	1.044	3.175	0.123	0.361	0.512	0.242
16 2024	BiomedGPT_peir_gross Lehigh University	1.86	4.471	0.016	0.151	0.276	0.052

Leaderboard on CheXpert Plus Dataset

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.

Rank	Model	RadCliQ-v1 ↓	RadCliQ-v0 ↓	BLEU ↑	BertScore ↑	SembScore ↑	RadGraph ↑
1 2024	CheXagent Stanford	2.213	5.147	0.077	-0.056	0.284	0.039
2 2024	CheXpertPlus_mimiccxr Stanford	2.07	4.912	0.103	0.002	0.318	0.049
3 2024	CheXpertPlus_chexpert* Stanford	1.953	4.736	0.142	0.02	0.38	0.07
4 2024	CheXpertPlus_chexpert_mimiccxr* Stanford	1.958	4.745	0.14	0.011	0.388	0.071
5 2023	Cvt2distilgpt2_mimiccxr CSIRO	2.202	5.132	0.083	-0.052	0.288	0.038
6 2023	Cvt2distilgpt2_iuxray CSIRO	2.352	5.396	0.061	-0.064	0.16	0.035
7 2024	MedVersa Harvard	2.031	4.831	0.09	0.013	0.337	0.05
8 2023	RadFM SJTU	2.251	5.206	0.067	-0.038	0.229	0.027
9 2023	RaDialog TUM	2.111	4.967	0.086	-0.035	0.348	0.041
10 2023	RGRG TUM	2.194	5.139	0.103	-0.039	0.263	0.047
11 2023	VLCI_mimiccxr SYSU	2.317	5.34	0.084	-0.092	0.248	0.032
12 2023	VLCI_iuxray SYSU	2.407	5.496	0.065	-0.098	0.165	0.032
13 2024	LLM-CXR KAIST	2.369	5.388	0.032	-0.082	0.201	0.013
14 2024	GPT4V OpenAI	2.316	5.318	0.055	-0.065	0.208	0.028
15 2024	BiomedGPT_iu_xray Lehigh University	2.366	5.378	0.02	-0.079	0.193	0.018
16 2024	BiomedGPT_peir_gross Lehigh University	2.467	5.542	0.005	-0.152	0.227	0.007