PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add ECG Processing Support to SHHS dataset for Sleep Signal Analysis

Open Soumya123p opened this issue 1 month ago • 0 comments

Students: Soumya Mazumder (soumyam4), Salazar, Andrew (aas15), Lin, Sharon (xinyiyl2) Paper Title: WatchSleepNet: A Novel Model and Pretraining Approach for Advancing Sleep Staging with Smartwatches Paper Link:

Overview

This PR implements support for the Sleep Heart Health Study (SHHS) dataset in PyHealth, enabling researchers to work with polysomnographic signals including EEG and ECG data for sleep-related cardiovascular and neurological research.

Changes

SHHS data overview can be found here - Sleep Heart Health Study (SHHS) This PR enhances the support for existing SHHS dataset - shhh.py by adding new feature to extract ecg signal from edf files.

1. Modification of file shhs.py

  • A. Fix the function process_EEG_data() - The function is currently not working as it is inheriting the class BaseSignalDataset which is deprecated in the recent version. Proper modification has been done to make the function working.

  • B. Add new function process_ECG_data() - A new function process_ECG_data() is added which provides the below features-

    • Advanced ECG signal processing with configurable parameters:
    • require_annotations: Optional annotation requirement (default: True)
    • select_chs: Configurable channel selection (default: ["ECG"])
    • target_fs: Target sampling frequency (default: 100 Hz)

Usage Example `from pyhealth.datasets import SHHSDataset

Initialize dataset

dataset = SHHSDataset( root="/path/to/SHHS/", dev=True, # Development mode for faster testing refresh_cache=False # Use existing cache )

Process EEG data for sleep analysis

eeg_data = dataset.process_EEG_data() print(f"Processed {len(eeg_data)} patients")

Process ECG data with flexible annotation handling

success = dataset.process_ECG_data( out_dir="/output/path/", require_annotations=False, # Handle missing annotations gracefully select_chs=["ECG"], target_fs=100 ) print(f"ECG processing successful: {success}")`

2. Modification of file utils.py

This contains the utility function required for processing different datasets. I have added two new functions here - read_edf_data() - to process the polysomnography signals Parameters: data_path: path to EDF file. label_path: SHHS XML annotation file. dataset: "SHHS" or "MESA". select_chs: list of channels to extract. target_fs: optional downsample frequency. Returns: data: (T, C) extracted channel signals. fs: sampling frequency. stages: stage array aligned with signal.

save_to_npz """Saves extracted ECG/PPG/sleep staging data to NPZ."""

3. Creation of new file shhs_test.py

This includes the relevant test cases for new as well as existing functions for shhs dataset.

Testing python -m pytest [test_shhs.py]

Soumya123p avatar Dec 01 '25 17:12 Soumya123p