mimic3-benchmarks
mimic3-benchmarks copied to clipboard
Datetime issues with preprocessing
If you do a fresh download of MIMIC3 and then clone this repo and then run the very first script: python -m mimic3benchmark.scripts.extract_subjects ../../physionet.org/files/mimiciii/1.4/ data/root
you will get the following error: OverflowError: Overflow in int64 addition
. It's triggered in the add_age_to_icustays()
method. It turns out just doing stays.INTIME - stays.DOB
will trigger this error. The fix I found was to do stays.INTIME.subtract(stays.DOB)
, like so:
stays["AGE"] = (
(stays.INTIME.subtract(stays.DOB)).apply(lambda s: s / np.timedelta64(1, "s"))
/ 60.0
/ 60
/ 24
/ 365
)
Specs: Pandas version 1.1.3 Python version 3.7.9
Additional issues:
get_events_for_stay()
has an issue where you get TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeProperties
. It turns out that intime
and outtime
that are passed through using the .iloc[]
method from lines 84 and 85 in extract_episodes_from_subjects.py
are DatetimeProperties. In order to fix this i changed those two lines to
intime = stays.INTIME.iloc[i].date[0]
outtime = stays.OUTTIME.iloc[i].date[0]
And then inside the method get_events_for_stay()
I changed the line inside the if statement to:
idx = idx | (
(events.CHARTTIME.dt.date >= intime) & (events.CHARTTIME.dt.date <= outtime)
)
Another issue:
another date subtraction issue in add_hours_elpased_to_events
(also typo in elapsed), changed events.CHARTTIME - dt
to events.CHARTTIME.dt.date.substract(dt)
to fix.
Thanks for the suggesting these fixes. I've also faced these issues.
Thanks for raising the issue. This is now solved like this:
stays['AGE'] = stays.apply(lambda e: (e['INTIME'].to_pydatetime()
- e['DOB'].to_pydatetime()).total_seconds() / 3600.0 / 24.0 / 365.0,
axis=1)
I have also specified which pandas
version is expected (tough this code works with newer versions too). Please see the updated README file.