AI-Vengers icon indicating copy to clipboard operation
AI-Vengers copied to clipboard

race labels for MIMIC-CXR ?

Open robintibor opened this issue 3 years ago • 5 comments

Hi,

I wondered how to obtain the race labels for MIMIC - CXR ?

I do have access to https://physionet.org/content/mimic-cxr/2.0.0/ and https://physionet.org/content/mimic-cxr-jpg/2.0.0/ but could not locate where you get the white/asian/black labels?

Like how to create the modified_viewposition_race_4-race-ethnicity_60-10-30_split_with_gender_age_ver_b.csv that you use in the training code?

Thanks for any help, Best, Robin

robintibor avatar Sep 20 '21 20:09 robintibor

Hi Robin,

Race labels can be found here Under the core directory, in the admissions dataset. From there you can join the subject_id with the CXR subject_id.

Let us know if we can help with anything else!

blackboxradiology avatar Sep 20 '21 20:09 blackboxradiology

ah amazing thanks that clears it up! Other questions, am I understading correctly there is some code that preprocesses MIMIC-CXR and that is not in this repo? Like, one cannot just follow:

  1. Fork/Download the GitHub repository.
  2. Fetch the data from the data URLs for open-source datasets and drop them in the data folder.
  3. Run the corresponding training code and save the trained model in the models folder.

for MIMIC-CXR, because https://github.com/Emory-HITI/AI-Vengers/blob/cbdf593b0d852e3078abbc72cf92aad03496511d/training_code/CXR_training/MIMIC/MIMIC_resnet34_race_detection_2021_06_29.ipynb starts from some dataframe that you have created with some code that is not in this repo?

robintibor avatar Sep 20 '21 21:09 robintibor

That's correct. At the moment you would have to join the csv dataframes and make your own train-val-test splits, like what we did with modified_viewposition_race_4-race-ethnicity_60-10-30_split_with_gender_age_ver_b.csv

blackboxradiology avatar Sep 20 '21 21:09 blackboxradiology

I see. One more question that came up: Did you try to handle subjects with multiple values for ethnicity in any way? For example, following code shows there are 168 subjects that had been entered both as BLACK/AFRICAN AMERICAN and WHITE and 2489 subjects with OTHER and WHITE:

admissions_df = pd.read_csv(os.path.join(mimic_folder, 'admissions.csv'))
ethnicity_df = admissions_df.loc[:,['subject_id', 'ethnicity']].drop_duplicates()

v = ethnicity_df.subject_id.value_counts()
subject_id_more_than_once = v.index[v.gt(1)]

ambiguous_ethnicity_df = ethnicity_df[ethnicity_df.subject_id.isin(subject_id_more_than_once)]

grouped = ambiguous_ethnicity_df.groupby('subject_id')
grouped.aggregate(lambda x: "_".join(sorted(x))).ethnicity.value_counts()

robintibor avatar Sep 21 '21 11:09 robintibor

Wow! Great catch! As far I know we were unaware of this multiple ethnicity problem. I will look into this and test using these changes. I suspect it could improve performance by reducing noise from mislabeled patients.

Thank you!

blackboxradiology avatar Sep 21 '21 11:09 blackboxradiology