mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

Duplicates of emar_detail rows in MIMIC-IV v2.0

Open alexmbennett2 opened this issue 2 years ago • 0 comments

Prerequisites

  • [ X] Put an X between the brackets on this line if you have done all of the following:
    • Checked the online documentation: https://mimic.mit.edu/
    • Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

Hello! I was doing some work with MIMIC-IV v2.0 emar_detail and noticed that there are some duplicate rows (~3,700). This may be human input error or something from the deidentification process.

I ran the following query to identify the rows with duplicates:

SELECT
    emar_id
    , parent_field_ordinal
FROM mimic_hosp.emar_detail ed
GROUP BY
    emar_id
    , parent_field_ordinal  
HAVING count(*) > 1;

I then validated that the rows were truly identical by pulling a couple of the emar_id/parent_field_ordinal combos. It would be nice to have these duplicate rows removed in future if possible.

I did notice that the constraints file commented out the emar_detail primary key element. Was it commented out for a particular reason?

Thanks for all the hard work maintaining MIMIC, it's great!

alexmbennett2 avatar Jun 24 '22 17:06 alexmbennett2