mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

How much of MIMIC-IV is MIMIC-III?

Open tsteffek opened this issue 1 year ago • 2 comments

Prerequisites

  • [X] Put an X between the brackets on this line if you have done all of the following:
    • Checked the online documentation: https://mimic.mit.edu/
    • Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

There are multiple issues regarding how much MIMIC-III and MIMIC-IV overlap. Most of those had the goal of merging the two datasets, and that problem is now solved by the CareVue dataset. However, I had trouble finding a good solution for basically the opposite operation; instead of merging III and IV, I'm looking for a way to remove data that was in III and is now also in IV. Background is that our team trained ML models on III and would now like to verify our results on the unseen IV data. I'm sure you're all aware of the problem that results in evaluating on data that has been used for training.

Since there does not seem to be a good way of linking the two datasets to filter for data in III, the question arises: how much of MIMIC-IV consists of MIMIC-III data? Does the 2008-2012 period in MIMIC-IV purely or mainly consist of MIMIC-III data, or was additional data added in that time period?

Also, to verify: is my assumption correct, that after adjusting for the anchor year shift, all data up to and including 2014 is potentially contaminated due to the anchor year group size?

Similar Issues

  • https://github.com/MIT-LCP/mimic-code/issues/1331
  • https://github.com/MIT-LCP/mimic-code/issues/815
  • https://github.com/MIT-LCP/mimic-code/issues/994

tsteffek avatar Apr 23 '23 21:04 tsteffek