PyHealth
PyHealth copied to clipboard
Add LabTOP model
UIUC ID: ruokunn2
Summary
This PR adds the LabTOP (Lab Test Outcome Prediction) model to PyHealth, enabling continuous numerical prediction of laboratory test values.
LabTOP uses digit-wise tokenization to represent numerical values as sequences of individual digits (e.g., 123.45 → ['1','2','3','.','4','5']), preserving exact precision while maintaining a compact vocabulary of ~20-50 tokens.
Paper Reference
- Title: LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records
- Authors: Sujeong Im, Jungwoo Oh, Edward Choi
- Conference: CHIL 2025 (Best Paper Award)
- arXiv: https://arxiv.org/abs/2502.14259
- Code: https://github.com/sujeongim/LabTOP
Implementation Details
- Architecture: GPT-2 transformer (12 layers, 768 dim, ~53M parameters)
-
Classes Added:
-
DigitWiseTokenizer: Converts numbers ↔ digit sequences -
LabTOPVocabulary: Manages complete vocabulary (special tokens + digits + lab codes) -
LabTOP: Main model class inheriting fromBaseModel
-
Files Modified
- ✅
pyhealth/models/labtop.py(new file, ~600 lines)
Performance (from paper)
- MAE: 0.064, SMAPE: 14.80%, NMAE: 0.042 on MIMIC-IV (44 lab types)
Could you add some test case for this model? thanks.