PyHealth
PyHealth copied to clipboard
added timestamp formats required by mimic3, and fixed storageapis so …
…date parsing should work better now. Also more logging for transparency.
This pull request introduces enhancements to the pyhealth library, focusing on improving dataset handling, task implementation, and testing. Key changes include updates to timestamp handling, the addition of new attributes and joins in dataset configurations, and refactoring of the medical coding task to improve data filtering and logging. Below is a categorized summary of the most important changes:
Dataset Handling Improvements:
- Enabled ordered uniqueness in
collected_global_event_dfby adding themaintain_order=Trueparameter to the.unique()method. This ensures consistent ordering of patient IDs during development mode. (pyhealth/datasets/base_dataset.py, pyhealth/datasets/base_dataset.pyL146-R146) - Added support for non-strict timestamp parsing and enhanced logging for timestamp-related operations in
load_table. (pyhealth/datasets/base_dataset.py, pyhealth/datasets/base_dataset.pyR229-R244) - Updated MIMIC-III dataset configuration to include
timestamp_formatfor tables and addedhadm_idas an attribute in multiple tables. Introduced a join operation forNOTEEVENTSwithADMISSIONSto enrich data. (pyhealth/datasets/configs/mimic3.yaml, [1] [2] [3] [4]
Medical Coding Task Enhancements:
- Refactored the
MIMIC3ICD9Codingtask to filter events byhadm_idinstead of timestamps, ensuring more accurate data association. Improved logging to handle cases with missing notes or ICD codes. (pyhealth/tasks/medical_coding.py, pyhealth/tasks/medical_coding.pyR58-R114) - Simplified the
mainmethod of the medical coding task by removing redundant comments and improving formatting. (pyhealth/tasks/medical_coding.py, pyhealth/tasks/medical_coding.pyL186-R208)
Testing and Logging Updates:
- Reorganized imports and improved the structure of the
train_medical_codingfunction intest.py. Enhanced logging and fixed minor formatting issues. (pyhealth/unittests/test.py, [1] [2] - Updated logging configurations and improved readability in
test_mortality_prediction.py. Refactored dataset initialization for better clarity. (pyhealth/unittests/test_mortality_prediction.py, [1] [2]
These changes collectively improve the robustness, maintainability, and usability of the pyhealth library, particularly for handling MIMIC datasets and medical coding tasks.