PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add MIT-BIH Arrhythmia Dataset, Classification Task, and Example

Open minkuek2 opened this issue 1 month ago • 0 comments

Summary

This PR adds support for the MIT-BIH Arrhythmia Dataset to PyHealth, including: • A signal-level dataset loader (MITBIHArrhythmiaDataset) • A classification task function (mitbih_classification_fn) • A runnable usage example (mitbih_example.py)

This provides a clean ECG benchmark dataset for PyHealth users and supports reproducible signal-processing research.

Feature

  1. MITBIHArrhythmiaDataset • Loads Kaggle MIT-BIH CSV format (187 signal values + label) • Returns unified (signal, label) format consistent with PyHealth signal datasets • Supports split="train" | "test" and optional transforms

  2. mitbih_classification_fn

Maps dataset samples into PyHealth task format: { "patient_id": ..., "visit_id": ..., "signal": Tensor(1×187), "label": int(0–4) }

  1. Example Script • Demonstrates loading dataset → defining task → training a simple CNN classifier • Serves as a minimal reproducible example for users

Tests

Basic verification performed: • Dataset loads correctly from Kaggle CSV files • Train/test split works as expected • Task function outputs PyHealth-compliant dictionaries • Example script runs end-to-end (CPU/Colab tested)

Note on Dataset Download

MIT-BIH dataset cannot be redistributed. Users must manually download from Kaggle:

https://www.kaggle.com/datasets/shayanfazeli/heartbeat Required files: mitbih_train.csv, mitbih_test.csv dataset = MITBIHArrhythmiaDataset(root="/path/to/mitbih", split="train")

minkuek2 avatar Dec 07 '25 23:12 minkuek2