gloria
gloria copied to clipboard
Does the CheXpert dataset include reports now?
Hi,
Thank you very much for releasing the source code of your work. I noticed that you use CheXpert for multimodal pre-training of your model. However, as far as I'm aware, the CheXpert dataset does not include the actual reports, only labels they extracted from the reports. I understand that people doing research on images + reports usually use datasets like MIMIC-CXR and Open-I. In fact, a fellow researcher downloaded the CheXpert dataset a couple of years ago and confirmed that it only came with labels and images, but no reports. However, I'm checking the website of the dataset right now (here: https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2) and just noticed that they have included new labels, i.e., cheXbert and visualCheXbert generated labels, and they have also increased the size of CheXpert-v1.0.zip to 471.12 GB (we have a copy downloaded a couple of years ago that weighs 439 GB).
Question: does that mean that the CheXpert dataset includes reports now? If so, that would be spectacular, because it would mean we could experiment with CheXpert + MIMIC-CXR instead of just using MIMIC-CXR alone (or CheXpert alone for that matter).
In fact, I'm curious: is there a particular reason why you didn't include MIMIC-CXR in your experiments (https://physionet.org/content/mimic-cxr/2.0.0/)?
Kind regards, Pablo
I have same problem also. Can we get the "master_updated.csv" file now?
In my opinion, cheXpert does not provide reports, and the experiments in this repository use a few human-written scripts.
cheXbert and visualCheXbert are labels created by a new way of generating labels for cheXpert, i.e. chexpert has one image and three labels: image, labelV1(labeler), labelV2(cheXbert), labelV3(visualCheXbert).