PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add MIMIC CXR Reports Dataset Class

Open lokanathdas1989 opened this issue 1 month ago • 0 comments

This contribution:

  1. Adds a new dataset: MIMIC-CXR Database 2.1.0
  2. Implements a new dataset class compliant with PyHealth’s BaseDataset
  3. Adds a Pydantic-validated YAML config
  4. Extracts PATIENTID/STUDYID/FINDINGS/IMPRESSION sections automatically
  5. Adds no breaking changes to PyHealth’s existing datasets
  6. Keeps data access external (MIMIC files must be obtained from PhysioNet through Credentialed Access)

This PR is submitted by the following group of UIUC students :

  1. Lokanath Das (ldas2)
  2. Jared Backofen (jaredb3)
  3. Jacob Ray Fuehne (jfuehne2)

Below Files are introduced as part of the PR :

pyhealth/ ├── datasets/ ├── mimic_cxr_reports.py # Dataset implementation ├── configs/ │ └── mimic_cxr_reports.yaml # Dataset configuration (Pydantic validated) ├── init.py # Updated the Dataset class relative import ├── tests/ ├── test_mimic_cxr_reports.py # Test script for dataset loader ├── docs/ ├── README_mimic_cxr_reports.md # Documentation

lokanathdas1989 avatar Dec 02 '25 04:12 lokanathdas1989