Add MIT-BIH Arrhythmia Dataset, Classification Task, and Example
Summary
This PR adds support for the MIT-BIH Arrhythmia Dataset to PyHealth, including: • A signal-level dataset loader (MITBIHArrhythmiaDataset) • A classification task function (mitbih_classification_fn) • A runnable usage example (mitbih_example.py)
This provides a clean ECG benchmark dataset for PyHealth users and supports reproducible signal-processing research.
Feature
-
MITBIHArrhythmiaDataset • Loads Kaggle MIT-BIH CSV format (187 signal values + label) • Returns unified (signal, label) format consistent with PyHealth signal datasets • Supports split="train" | "test" and optional transforms
-
mitbih_classification_fn
Maps dataset samples into PyHealth task format:
{ "patient_id": ..., "visit_id": ..., "signal": Tensor(1×187), "label": int(0–4) }
- Example Script • Demonstrates loading dataset → defining task → training a simple CNN classifier • Serves as a minimal reproducible example for users
Tests
Basic verification performed: • Dataset loads correctly from Kaggle CSV files • Train/test split works as expected • Task function outputs PyHealth-compliant dictionaries • Example script runs end-to-end (CPU/Colab tested)
Note on Dataset Download
MIT-BIH dataset cannot be redistributed. Users must manually download from Kaggle:
https://www.kaggle.com/datasets/shayanfazeli/heartbeat
Required files: mitbih_train.csv, mitbih_test.csv
dataset = MITBIHArrhythmiaDataset(root="/path/to/mitbih", split="train")