PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

added timestamp formats required by mimic3, and fixed storageapis so …

Open jhnwu3 opened this issue 8 months ago • 0 comments

…date parsing should work better now. Also more logging for transparency. This pull request introduces enhancements to the pyhealth library, focusing on improving dataset handling, task implementation, and testing. Key changes include updates to timestamp handling, the addition of new attributes and joins in dataset configurations, and refactoring of the medical coding task to improve data filtering and logging. Below is a categorized summary of the most important changes:

Dataset Handling Improvements:

  • Enabled ordered uniqueness in collected_global_event_df by adding the maintain_order=True parameter to the .unique() method. This ensures consistent ordering of patient IDs during development mode. (pyhealth/datasets/base_dataset.py, pyhealth/datasets/base_dataset.pyL146-R146)
  • Added support for non-strict timestamp parsing and enhanced logging for timestamp-related operations in load_table. (pyhealth/datasets/base_dataset.py, pyhealth/datasets/base_dataset.pyR229-R244)
  • Updated MIMIC-III dataset configuration to include timestamp_format for tables and added hadm_id as an attribute in multiple tables. Introduced a join operation for NOTEEVENTS with ADMISSIONS to enrich data. (pyhealth/datasets/configs/mimic3.yaml, [1] [2] [3] [4]

Medical Coding Task Enhancements:

  • Refactored the MIMIC3ICD9Coding task to filter events by hadm_id instead of timestamps, ensuring more accurate data association. Improved logging to handle cases with missing notes or ICD codes. (pyhealth/tasks/medical_coding.py, pyhealth/tasks/medical_coding.pyR58-R114)
  • Simplified the main method of the medical coding task by removing redundant comments and improving formatting. (pyhealth/tasks/medical_coding.py, pyhealth/tasks/medical_coding.pyL186-R208)

Testing and Logging Updates:

  • Reorganized imports and improved the structure of the train_medical_coding function in test.py. Enhanced logging and fixed minor formatting issues. (pyhealth/unittests/test.py, [1] [2]
  • Updated logging configurations and improved readability in test_mortality_prediction.py. Refactored dataset initialization for better clarity. (pyhealth/unittests/test_mortality_prediction.py, [1] [2]

These changes collectively improve the robustness, maintainability, and usability of the pyhealth library, particularly for handling MIMIC datasets and medical coding tasks.

jhnwu3 avatar May 04 '25 20:05 jhnwu3