Temporal MIMIC Dataset, Task, and Tests
Name: Erwan Caron NetID: ecaron2 Reproducing Paper Title: HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis Contribution Type: Dataset, Task, and Tests
I created a dataset to support the HIST-AID Temporal MIMIC dataset within the PyHealth framework. pyhealth/datasets/temporal_mimic.py reads the preprocessed Temporal MIMIC CSV generated by the HIST-AID pipeline and loads it into PyHealth. The column-to-field mappings are defined in pyhealth/datasets/configs/temporal_mimic.yaml. The default task in pyhealth/tasks/temporal_mimic.py pairs radiology reports and image paths with their multilabel pathology labels, setting up a multilabel classification task.
To test the implementation, you can run the unit tests at tests/core/test_temporal_mimic.py. One test verifies that the mapping into PyHealth matches the CSV rows, and the second verifies the output of the TemporalMIMICMultilabelClassification task.
To test at full scale, there is a main method in the dataset module that will run the dataset and its default task. This requires access to the full MIMIC-CXR-JPG dataset as well as the preprocessed CSVs generated by the HIST-AID pipeline.