AI_Clinician icon indicating copy to clipboard operation
AI_Clinician copied to clipboard

Problem Reproducing Cohort

Open wjxgeorge opened this issue 5 years ago • 4 comments

I'm currently working on a python version data preprocessing code and I'm actually having problem reproducing the cohort as indicated by patientIDs_MIMIC3.csv.

Some hadmid corresponding to icustayid in patientIDs_MIMIC3.csv actually are not in abx.csv file in the first place. For example, icustayid 55 corresponding to hadmid 147080, which won't be returned even I directly query physionet's mimic-iii database.

Anyone can reproduce it using MATLAB code?

wjxgeorge avatar Jun 26 '19 03:06 wjxgeorge

hi Nephalen ,i think here is a clear structure instructions about mimic database https://github.com/alistairewj/sepsis3-mimic

paulrich1234 avatar Jul 16 '19 07:07 paulrich1234

hi Nephalen ,i think here is a clear structure instructions about mimic database https://github.com/alistairewj/sepsis3-mimic

I'm talking about the code in this repository. I've verified mimic-iii installation.

wjxgeorge avatar Jul 16 '19 10:07 wjxgeorge

Hi Nephalen, I am experiencing the same issue as you do using the provided MATLAB code. For example, ICU stay IDs 200035 and 299994 do not have any antibiotic prescriptions in the database (thus do not meet the sepsis criteria), however they are included in patientIDs_MIMIC3.csv.

shengpu-tang avatar Sep 30 '19 18:09 shengpu-tang

Hi, @Nephalen and @shengpu1126, I also have the similar issues as reported by you.

  • After running all the provided code (with the fix from the pull requrest), I got an cohort of 16550 patients (comparing to 17083 patients in the patientIDs_MIMIC3.csv). I tried to take the intersection between these two cohorts and got 0 common ids (with addtion of the translation of 200,000 as described in README file).
  • I tried to get the intersection between all patients id in the icustays table with the ones in the patientIDs_MIMIC3.csv. There is also 0 common ids.

The possible reason for no common ids could be the translation of 200,000 added to the published IDs. The largest subject_id for patients in MIMIC's icustays table is 99999, which is smaller than the translation 200,000.

  • I feel the ids they published are not patient identifiers, but icustay identifiers, as the magic number 200,000 also appears in the matlab script. After realizing this, I compare the icustays_id with the ones in patientIDs_MIMIC3.csv. This time I got 14965 common ids, compared to 20944 unique icustays I got from running the provided code on MIMIC-III dataset. This means that some ids in the provided csv files are problematic.
  • I tried to get the intersection between all icustay ids in the icustays table with the ones in the patientIDs_MIMIC3.csv. There are 17803 common ids which indicate that these values could really be icustay ids.

In short, the PatientID refers to icustayids. And running the provided code on the current MIMIC-III database results in a slightly different selection in my experience. Please correct me in case I did something wrong.

ZhiliangWu avatar Nov 04 '19 15:11 ZhiliangWu