Laptop Contribution for DL4H

Open mlho2 opened this issue 1 month ago • 1 comments

Who I am

Name: Man Ling Ho NetID: mlho2

Type of Contribution

New model integration: LabTOP (Unified Autoregressive Transformer for Lab Test Outcome Prediction)
New tasks: Lab Test Prediction MIMIC-IV
New example script: Demo script for training and evaluating LabTOP on a MIMIC-IV subset

High-Level Summary

This PR integrates the LabTOP model (Hur et al., 2024) into a reproducible PyTorch pipeline for lab test outcome prediction from ICU EHR data. The integration adds: LabTOP model implemented as a BaseModel wrapper with:

Digit-level lab value tokenization
Absolute temporal embeddings
Autoregressive Transformer encoder
Token-level output and loss computation

Task class: Lab Test Prediction MIMIC-IV

Generates sequences of lab events per patient admission
Includes demographic tokens for age and gender
Computes predictions for next lab value events

Example script:

Trains LabTOP on a MIMIC-IV subset
Demonstrates evaluation with MAE, NMAE, and token-level accuracy metrics

All implementation follows PyHealth contributing guidelines (PEP8, Google-style docstrings, documented function signatures).

Files to Review

Modified:

pyhealth/models/init.py → register LabTOP
pyhealth/datasets/init.py → register LabTOP
pyhealth/tasks/init.py → register LabTOP New: pyhealth/models/labtop_transformer.py→ LabTOP model implementation pyhealth/tasks/labtop_next_token.py→ Lab Test Prediction MIMICIV task examples/labtop_example.py → end-to-end training and evaluation demo examples/labtop.ipynb → Jupyter Notebook Version tests/core/test_labtop.py → unit test for model and task

Dependencies

Runtime dependencies:

torch, numpy, pandas, scikit-learn, transformers, pyhealth

Optional dependencies:

matplotlib, seaborn for visualization

How to Run the Example

Prepare MIMIC-IV subset data:

data/admissions.csv.gz
data/patients.csv.gz
data/labevents.csv.gz

Run training and evaluation:

python examples/labtop_example.py

Script demonstrates:

Construction of tokenized lab sequences with demographics
Training LabTOP for 10 epochs
Evaluation with token-level accuracy, MAE, and NMAE metrics

References

Hur, Y., et al., LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records, 2024, arXiv preprint.
MIMIC-IV database: https://physionet.org/content/mimiciv

Dec 07 '25 04:12 mlho2

This is duplicate from https://github.com/sunlabuiuc/PyHealth/pull/633

Dec 07 '25 16:12 Logiquo