PyHealth
PyHealth copied to clipboard
[CS598] Read mimic III nurse note and parse the data
The MIMIC 3 nurse data is used for id-identification ML tasks.
The dataset contains raw nurse notes data as well as a masked version of nurse notes.
id.res file
START_OF_RECORD=1||||1||||
Hello World [**Name 1**] from [**Organization**].
||||END_OF_RECORD
id.text file
START_OF_RECORD=1||||1||||
Hello World Lixin from UIUC.
||||END_OF_RECORD
My code parse all records in single id.txt and id.res files, find the matching from mask to the original text as well as the index range in the original text.
I am going to cleanup some code, and make it more readable before end of today.
Closing this PR as it lacks proper labeling. Please add appropriate labels and reopen if needed.