PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

[CS598] Read mimic III nurse note and parse the data

Open NunuSnowman opened this issue 8 months ago • 1 comments

The MIMIC 3 nurse data is used for id-identification ML tasks.

The dataset contains raw nurse notes data as well as a masked version of nurse notes.

id.res file

START_OF_RECORD=1||||1||||
Hello World [**Name 1**] from [**Organization**]. 

||||END_OF_RECORD

id.text file

START_OF_RECORD=1||||1||||
Hello World Lixin from UIUC. 

||||END_OF_RECORD

My code parse all records in single id.txt and id.res files, find the matching from mask to the original text as well as the index range in the original text.

NunuSnowman avatar May 08 '25 00:05 NunuSnowman

I am going to cleanup some code, and make it more readable before end of today.

NunuSnowman avatar May 08 '25 00:05 NunuSnowman

Closing this PR as it lacks proper labeling. Please add appropriate labels and reopen if needed.

jhnwu3 avatar Aug 04 '25 01:08 jhnwu3