PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add LabTOP model

Open ruokun-niu opened this issue 1 month ago • 1 comments

UIUC ID: ruokunn2

Summary

This PR adds the LabTOP (Lab Test Outcome Prediction) model to PyHealth, enabling continuous numerical prediction of laboratory test values.

LabTOP uses digit-wise tokenization to represent numerical values as sequences of individual digits (e.g., 123.45['1','2','3','.','4','5']), preserving exact precision while maintaining a compact vocabulary of ~20-50 tokens.

Paper Reference

  • Title: LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records
  • Authors: Sujeong Im, Jungwoo Oh, Edward Choi
  • Conference: CHIL 2025 (Best Paper Award)
  • arXiv: https://arxiv.org/abs/2502.14259
  • Code: https://github.com/sujeongim/LabTOP

Implementation Details

  • Architecture: GPT-2 transformer (12 layers, 768 dim, ~53M parameters)
  • Classes Added:
    • DigitWiseTokenizer: Converts numbers ↔ digit sequences
    • LabTOPVocabulary: Manages complete vocabulary (special tokens + digits + lab codes)
    • LabTOP: Main model class inheriting from BaseModel

Files Modified

  • pyhealth/models/labtop.py (new file, ~600 lines)

Performance (from paper)

  • MAE: 0.064, SMAPE: 14.80%, NMAE: 0.042 on MIMIC-IV (44 lab types)

ruokun-niu avatar Nov 30 '25 01:11 ruokun-niu

Could you add some test case for this model? thanks.

Logiquo avatar Dec 07 '25 16:12 Logiquo