Add NoisyMRC ICU Mortality Dataset + Task
Type: Dataset + Task Contribution Net IDs: pbadan2, isanaka2 Paper: Minimax Risk Classifiers for Mislabeled Data: a Study on Patient Outcome Prediction Tasks Paper Link: https://proceedings.mlr.press/v252/filippozzi24a.html Dataset Referenced by paper: https://www.kaggle.com/competitions/widsdatathon2020/data
Description:
Adds Noisy MRC ICU Dataset implementation based on the Minimax Risk Classifier (MRC) framework. This contains patient ICU visits from the MIT GOSSIS WiDS 2020 Dataset but uses the preprocessed version released by the MRC authors, including alsocat (one-hot encoded categorical features) and nocat (continuous features). The implementation provides a NoisyMRCICUMortalityDataset class inheriting BaseDataset, YAML for both feature sets, built-in preprocessing, test coverage, and documentation.
The mortality prediction task transforms each ICU visit into a tabular sample (each row is one visit) with all processed features and hospital death outcome as the label (y, either 0 or 1). This is used for both alsocat and nocat datasets and enables downstream MRC experiments.
Files added:
- pyhealth/datasets/mrc.py
- pyhealth/datasets/configs/mrc.yaml
- pyhealth/tasks/mrc_task.py
- tests/core/test_mrc.py
- docs/api/datasets/pyhealth.datasets.NoisyMRCICUMortalityDataset.rst
Files changed:
- pyhealth/datasets/init.py
- pyhealth/tasks/init.py
- docs/api/datasets.rst