Add ECG Processing Support to SHHS dataset for Sleep Signal Analysis
Students: Soumya Mazumder (soumyam4), Salazar, Andrew (aas15), Lin, Sharon (xinyiyl2) Paper Title: WatchSleepNet: A Novel Model and Pretraining Approach for Advancing Sleep Staging with Smartwatches Paper Link:
Overview
This PR implements support for the Sleep Heart Health Study (SHHS) dataset in PyHealth, enabling researchers to work with polysomnographic signals including EEG and ECG data for sleep-related cardiovascular and neurological research.
Changes
SHHS data overview can be found here - Sleep Heart Health Study (SHHS) This PR enhances the support for existing SHHS dataset - shhh.py by adding new feature to extract ecg signal from edf files.
1. Modification of file shhs.py
-
A. Fix the function process_EEG_data() - The function is currently not working as it is inheriting the class BaseSignalDataset which is deprecated in the recent version. Proper modification has been done to make the function working.
-
B. Add new function process_ECG_data() - A new function process_ECG_data() is added which provides the below features-
- Advanced ECG signal processing with configurable parameters:
- require_annotations: Optional annotation requirement (default: True)
- select_chs: Configurable channel selection (default: ["ECG"])
- target_fs: Target sampling frequency (default: 100 Hz)
Usage Example `from pyhealth.datasets import SHHSDataset
Initialize dataset
dataset = SHHSDataset( root="/path/to/SHHS/", dev=True, # Development mode for faster testing refresh_cache=False # Use existing cache )
Process EEG data for sleep analysis
eeg_data = dataset.process_EEG_data() print(f"Processed {len(eeg_data)} patients")
Process ECG data with flexible annotation handling
success = dataset.process_ECG_data( out_dir="/output/path/", require_annotations=False, # Handle missing annotations gracefully select_chs=["ECG"], target_fs=100 ) print(f"ECG processing successful: {success}")`
2. Modification of file utils.py
This contains the utility function required for processing different datasets. I have added two new functions here - read_edf_data() - to process the polysomnography signals Parameters: data_path: path to EDF file. label_path: SHHS XML annotation file. dataset: "SHHS" or "MESA". select_chs: list of channels to extract. target_fs: optional downsample frequency. Returns: data: (T, C) extracted channel signals. fs: sampling frequency. stages: stage array aligned with signal.
save_to_npz """Saves extracted ECG/PPG/sleep staging data to NPZ."""
3. Creation of new file shhs_test.py
This includes the relevant test cases for new as well as existing functions for shhs dataset.
Testing python -m pytest [test_shhs.py]