PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add OhioT1DM Dataset for Blood Glucose Level Prediction

Open ChengcenZhou opened this issue 1 month ago • 0 comments

Contributor Info

Type of Contribution

  • New dataset (OhioT1DM: Ohio Type 1 Diabetes Mellitus)
  • New tasks (blood glucose prediction, hypoglycemia detection, glucose range classification)
  • New unit tests

Relationship to Previous PR

This PR builds on my earlier contribution in PR #682 (WESAD Dataset for Wearable Stress Detection). Together, the WESAD and OhioT1DM datasets are used in my DLH final project replicating Simulation of Health Time Series with Nonstationarity (Toye, Gomez, & Kleinberg, 2024).

PR Dataset Task Domain
#682 WESAD Stress Detection Mental Health / Wearables
This PR OhioT1DM Blood Glucose Prediction Diabetes Management

What's OhioT1DM Dataset

This PR integrates the OhioT1DM dataset [Marling & Bunescu, 2020] into PyHealth for blood glucose level prediction research. It adds:

  • An OhioT1DMDataset class for loading continuous glucose monitoring (CGM) data from 12 subjects with Type 1 Diabetes
  • Task functions for blood glucose prediction (30min and 60min horizons), hypoglycemia detection, hyperglycemia detection, and glucose range classification
  • Unit tests with synthetic XML data generation

The implementation follows the contributing guidelines (PEP8, Google-style docstrings, and documented function signatures).

Files to Review

Modified:

  • pyhealth/datasets/__init__.py: register OhioT1DMDataset
  • pyhealth/tasks/__init__.py: register blood_glucose_prediction_ohiot1dm_fn

New:

  • pyhealth/datasets/ohiot1dm.py: OhioT1DMDataset class implementation
  • pyhealth/tasks/blood_glucose_prediction_ohiot1dm.py: Task functions for glucose prediction
  • tests/test_ohiot1dm.py: Unit tests for dataset and tasks

Dataset Info

Item Details
Subjects 12 (2018 cohort: 6, 2020 cohort: 6)
Duration 8 weeks per subject
CGM Readings Every 5 minutes
Data Includes Glucose, insulin (basal/bolus), meals, exercise, sleep, physiological sensors
Source UCI / Ohio University

Reference

  • Paper Title: Marling, C., & Bunescu, R. "The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020", CEUR Workshop Proceedings, 2020
  • Paper Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7881904/
  • Dataset Link: https://www.kaggle.com/datasets/ryanmouton/ohiot1dm

How to Use

from pyhealth.datasets import OhioT1DMDataset
from pyhealth.tasks import blood_glucose_prediction_ohiot1dm_fn

# Load dataset
dataset = OhioT1DMDataset(root="/path/to/OhioT1DM/")

# Apply task (30-minute prediction horizon)
dataset = dataset.set_task(blood_glucose_prediction_ohiot1dm_fn)

# Access samples
print(f"Total samples: {len(dataset.samples)}")

ChengcenZhou avatar Dec 07 '25 05:12 ChengcenZhou