PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Laptop Contribution for DL4H

Open mlho2 opened this issue 1 month ago • 1 comments

Who I am

Name: Man Ling Ho NetID: mlho2

Type of Contribution

  • New model integration: LabTOP (Unified Autoregressive Transformer for Lab Test Outcome Prediction)
  • New tasks: Lab Test Prediction MIMIC-IV
  • New example script: Demo script for training and evaluating LabTOP on a MIMIC-IV subset

High-Level Summary

This PR integrates the LabTOP model (Hur et al., 2024) into a reproducible PyTorch pipeline for lab test outcome prediction from ICU EHR data. The integration adds: LabTOP model implemented as a BaseModel wrapper with:

  • Digit-level lab value tokenization
  • Absolute temporal embeddings
  • Autoregressive Transformer encoder
  • Token-level output and loss computation

Task class: Lab Test Prediction MIMIC-IV

  • Generates sequences of lab events per patient admission
  • Includes demographic tokens for age and gender
  • Computes predictions for next lab value events

Example script:

  • Trains LabTOP on a MIMIC-IV subset
  • Demonstrates evaluation with MAE, NMAE, and token-level accuracy metrics

All implementation follows PyHealth contributing guidelines (PEP8, Google-style docstrings, documented function signatures).

Files to Review

Modified:

  • pyhealth/models/init.py → register LabTOP
  • pyhealth/datasets/init.py → register LabTOP
  • pyhealth/tasks/init.py → register LabTOP New: pyhealth/models/labtop_transformer.py→ LabTOP model implementation pyhealth/tasks/labtop_next_token.py→ Lab Test Prediction MIMICIV task examples/labtop_example.py → end-to-end training and evaluation demo examples/labtop.ipynb → Jupyter Notebook Version tests/core/test_labtop.py → unit test for model and task

Dependencies

Runtime dependencies:

  • torch, numpy, pandas, scikit-learn, transformers, pyhealth

Optional dependencies:

  • matplotlib, seaborn for visualization

How to Run the Example

  1. Prepare MIMIC-IV subset data:
  • data/admissions.csv.gz
  • data/patients.csv.gz
  • data/labevents.csv.gz
  1. Run training and evaluation:
  • python examples/labtop_example.py
  1. Script demonstrates:
  • Construction of tokenized lab sequences with demographics
  • Training LabTOP for 10 epochs
  • Evaluation with token-level accuracy, MAE, and NMAE metrics

References

  • Hur, Y., et al., LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records, 2024, arXiv preprint.
  • MIMIC-IV database: https://physionet.org/content/mimiciv

mlho2 avatar Dec 07 '25 04:12 mlho2

This is duplicate from https://github.com/sunlabuiuc/PyHealth/pull/633

Logiquo avatar Dec 07 '25 16:12 Logiquo