PyHealth
PyHealth copied to clipboard
Laptop Contribution for DL4H
Who I am
Name: Man Ling Ho NetID: mlho2
Type of Contribution
- New model integration: LabTOP (Unified Autoregressive Transformer for Lab Test Outcome Prediction)
- New tasks: Lab Test Prediction MIMIC-IV
- New example script: Demo script for training and evaluating LabTOP on a MIMIC-IV subset
High-Level Summary
This PR integrates the LabTOP model (Hur et al., 2024) into a reproducible PyTorch pipeline for lab test outcome prediction from ICU EHR data. The integration adds: LabTOP model implemented as a BaseModel wrapper with:
- Digit-level lab value tokenization
- Absolute temporal embeddings
- Autoregressive Transformer encoder
- Token-level output and loss computation
Task class: Lab Test Prediction MIMIC-IV
- Generates sequences of lab events per patient admission
- Includes demographic tokens for age and gender
- Computes predictions for next lab value events
Example script:
- Trains LabTOP on a MIMIC-IV subset
- Demonstrates evaluation with MAE, NMAE, and token-level accuracy metrics
All implementation follows PyHealth contributing guidelines (PEP8, Google-style docstrings, documented function signatures).
Files to Review
Modified:
- pyhealth/models/init.py → register LabTOP
- pyhealth/datasets/init.py → register LabTOP
- pyhealth/tasks/init.py → register LabTOP New: pyhealth/models/labtop_transformer.py→ LabTOP model implementation pyhealth/tasks/labtop_next_token.py→ Lab Test Prediction MIMICIV task examples/labtop_example.py → end-to-end training and evaluation demo examples/labtop.ipynb → Jupyter Notebook Version tests/core/test_labtop.py → unit test for model and task
Dependencies
Runtime dependencies:
- torch, numpy, pandas, scikit-learn, transformers, pyhealth
Optional dependencies:
- matplotlib, seaborn for visualization
How to Run the Example
- Prepare MIMIC-IV subset data:
- data/admissions.csv.gz
- data/patients.csv.gz
- data/labevents.csv.gz
- Run training and evaluation:
- python examples/labtop_example.py
- Script demonstrates:
- Construction of tokenized lab sequences with demographics
- Training LabTOP for 10 epochs
- Evaluation with token-level accuracy, MAE, and NMAE metrics
References
- Hur, Y., et al., LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records, 2024, arXiv preprint.
- MIMIC-IV database: https://physionet.org/content/mimiciv
This is duplicate from https://github.com/sunlabuiuc/PyHealth/pull/633