mimic3-benchmarks icon indicating copy to clipboard operation
mimic3-benchmarks copied to clipboard

Datetime issues with preprocessing

Open davzaman opened this issue 4 years ago • 3 comments

If you do a fresh download of MIMIC3 and then clone this repo and then run the very first script: python -m mimic3benchmark.scripts.extract_subjects ../../physionet.org/files/mimiciii/1.4/ data/root you will get the following error: OverflowError: Overflow in int64 addition. It's triggered in the add_age_to_icustays() method. It turns out just doing stays.INTIME - stays.DOB will trigger this error. The fix I found was to do stays.INTIME.subtract(stays.DOB), like so:

stays["AGE"] = (
        (stays.INTIME.subtract(stays.DOB)).apply(lambda s: s / np.timedelta64(1, "s"))
        / 60.0
        / 60
        / 24
        / 365
    )

Specs: Pandas version 1.1.3 Python version 3.7.9

davzaman avatar Nov 09 '20 21:11 davzaman

Additional issues: get_events_for_stay() has an issue where you get TypeError: Invalid comparison between dtype=datetime64[ns] and DatetimeProperties. It turns out that intime and outtime that are passed through using the .iloc[] method from lines 84 and 85 in extract_episodes_from_subjects.py are DatetimeProperties. In order to fix this i changed those two lines to

        intime = stays.INTIME.iloc[i].date[0]
        outtime = stays.OUTTIME.iloc[i].date[0]

And then inside the method get_events_for_stay() I changed the line inside the if statement to:

        idx = idx | (
            (events.CHARTTIME.dt.date >= intime) & (events.CHARTTIME.dt.date <= outtime)
        )

davzaman avatar Nov 09 '20 22:11 davzaman

Another issue: another date subtraction issue in add_hours_elpased_to_events (also typo in elapsed), changed events.CHARTTIME - dt to events.CHARTTIME.dt.date.substract(dt) to fix.

davzaman avatar Nov 09 '20 22:11 davzaman

Thanks for the suggesting these fixes. I've also faced these issues.

birdx0810 avatar Dec 27 '20 09:12 birdx0810

Thanks for raising the issue. This is now solved like this:

stays['AGE'] = stays.apply(lambda e: (e['INTIME'].to_pydatetime()
                                          - e['DOB'].to_pydatetime()).total_seconds() / 3600.0 / 24.0 / 365.0,
                               axis=1)

I have also specified which pandas version is expected (tough this code works with newer versions too). Please see the updated README file.

hrayrhar avatar Apr 14 '23 06:04 hrayrhar