mimic-code
mimic-code copied to clipboard
Significantly fewer matches for fluid administration (electrolytes) for new mimic-iv 3.1 `anchor_year_group` = `"2020 - 2022"` data
Prerequisites
- [x] Put an X between the brackets on this line if you have done all of the following:
- Checked the online documentation: https://mimic.mit.edu/
- Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=
Description
We observe significantly fewer matches for fluid administration in the inputevents table for the new mimic-iv version 3.1 data with anchor_year_group = "2020 - 2022".
import pandas as pd
from pathlib import Path
import gzip
path_new = Path("/path/to/miiv/")
with gzip.open(path_new / "icu" / "icustays.csv.gz") as f:
icustays_new = pd.read_csv(f)
with gzip.open(path_new / "hosp" / "patients.csv.gz") as f:
patients_new = pd.read_csv(f)
merged_new = pd.merge(
left=icustays_new,
right=patients_new,
on="subject_id",
how="left",
validate="m:1"
)
fluids = pd.read_csv("fluid.csv")
with gzip.open(path_new / "icu" / "inputevents.csv.gz") as f:
iter_csv = pd.read_csv(f, chunksize=10000, usecols=["stay_id", "itemid"])
inputevents_new = pd.concat([c[c["itemid"].isin(fluids["itemid"])] for c in iter_csv])
print(f"inputevents_new: {len(inputevents_new)}")
merged = pd.merge(
left=merged_new,
right=inputevents_new,
on="stay_id",
how="left",
validate="1:m"
)
print("\nraw counts")
sized = merged.groupby(["anchor_year_group", "itemid"]).size()
print(sized.reset_index().pivot(columns="anchor_year_group", index="itemid"))
print("\ncounts by los")
sized = sized / merged_new.groupby(["anchor_year_group"])["los"].sum()
print(sized.reset_index().pivot(columns="anchor_year_group", index="itemid"))
prints
inputevents_new: 2395955
raw counts
0
anchor_year_group 2008 - 2010 2011 - 2013 2014 - 2016 2017 - 2019 2020 - 2022
itemid
225158.0 467118 302927 321691 338959 147961
225159.0 5382 3104 1659 981 405
225161.0 1218 1362 3494 3480 1289
225943.0 185124 137696 154092 157131 65449
225944.0 27006 16943 18161 22241 10697
228341.0 25 64 143 105 48
counts by los
0
anchor_year_group 2008 - 2010 2011 - 2013 2014 - 2016 2017 - 2019 2020 - 2022
itemid
225158.0 4.608710 4.666950 4.915976 5.411347 3.051171
225159.0 0.053100 0.047821 0.025352 0.015661 0.008352
225161.0 0.012017 0.020983 0.053394 0.055557 0.026581
225943.0 1.826483 2.121370 2.354783 2.508535 1.349654
225944.0 0.266448 0.261027 0.277530 0.355069 0.220588
228341.0 0.000247 0.000986 0.002185 0.001676 0.000990
Note that there is only 1/2 as many matches for each itemid for anchor_year_group = "2020 - 2022", both in absolute numbers and if normalized by length of stay.
Is this expected? Fluid management is an important task in the ICU. We were surprised to see such a drop in matches for the new mimic-iv 3.1 data.