mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

No distinction between ethnicity and race in MIMIC-IV

Open broganjb opened this issue 2 years ago • 2 comments

Prerequisites

  • [ X] Put an X between the brackets on this line if you have done all of the following:
    • Checked the online documentation: https://mimic.mit.edu/
    • Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

When using MIMIC-IV as a source of validation data, my colleagues and I realized that there is no distinction between race and ethnicity in the admissions table of mimic_core.

We ran the follow query

select distinct ethnicity from mimic_core.admissions;

and received the following 8 distinct outputs

BLACK/AFRICAN
AMERICAN INDIAN/ALASKA NATIVE
UNABLE TO OBTAIN
ASIAN
OTHER
UNKNOWN
HISPANIC/LATINO
WHITE

We were wondering how others have dealt with the issue of no distinction between race and ethnicity. The guidelines from the US Census Bureau is to split ethnicity into hispanic/latino and not hispanic/latino:

https://www.cosb.us/home/showpublisheddocument/5935/637356700118370000.

However, the ethnicity variable in MIMIC-IV also contains what we commonly define as race (i.e. white, black, asian, american indian/alaska native). It would be great to get to the bottom of this as the demographics of MIMIC-IV are important for reporting, especially when comparing model performance across subpopulations to describe potential ethnic and racial disparities. There appear to be some subject_ids that have a race for one admission and an ethnicity for another admission, which further confuses reporting (example: subject_id==15743696).

Lastly, we are trying to figure out how others reconciled issues around multiple ethnicities being reported for the same subject_id for different hadm_ids. We understand that data are not always complete, but is it standard practice to report a specified race (i.e. BLACK/AFRICAN AMERICAN) rather than OTHER if a subject_id has at least one admission with the specified race?

broganjb avatar Jan 21 '22 17:01 broganjb

Yeah, so off the top of my head:

  • I'm pretty sure it was called ethnicity in MIMIC-II, so we've sort of kept the column name as a legacy.
  • We do not have any documentation of US Census style "ethnicity" in the raw data (maybe it exists, but it's not in our warehouse).
  • the column is documented on hospital admission and so... unfortunately there is some inconsistency. We could settle on a "best" way to do this and have a query in this repo (I welcome suggestions here). The column was intentionally kept in the admissions table to allow for transparency in this.
  • We do aggregate values for deidentification purposes (some entries have only 1-2 individuals) and we will look to increasing the granularity here.

alistairewj avatar Jan 30 '22 15:01 alistairewj

I'm still fairly new to using the MIMIC datasets and a lot of the work I'm doing is around health disparities, so the race/ethnicity variables are ones I'm thinking deeply about. The approach I'm thinking of taking with this is just dealing with the race/ethnicity classifications in my R or Python code after I've already created my cohorts.

I'm working on writing some code that splits that column into separate race and ethnicity for each patient in my cohort based on the US Census recommendations and current thinking in the literature (see for example https://link.springer.com/article/10.1007/s40037-020-00602-3).

It would involve first finding all the admissions for a particular patient and seeing how the ethnicity variable is coded, and if both ethnicity and race values appear at different admissions as you mentioned then I would use that to populate both columns, and if not then the unknown would be coded as such.

Another option I've considered is creating a new table in my local copy of MIMIC-IV with subject_id, patient_race, and patient_ethnicity, based on the same code.

Not sure if this is the best approach but it's what I'm thinking right now. Happy to consider any and all suggestions.

marymlucas avatar Feb 02 '22 03:02 marymlucas