Add SOZStimulationDataset for Seizure Onset Zone Localization from SPES EEG
Contributors: Vineetha Gurrala, vgurr4 Clay Douglas, ckd6
Type: Dataset Contribution
Description: This PR introduces SOZStimulationDataset, a PyTorch-compatible dataset class for single-pulse electrical stimulation (SPES) EEG recordings used for seizure onset zone (SOZ) localization, replicating the dataset structure used in our reproducibility project:
Reproducing: SAMIL: Spatial Attention-based Multi-modal Integration for Localizing Seizure Onset Zones from Single-pulse Electrical Stimulation This dataset class does not include raw EEG data due to size/IRB constraints, but instead provides a standardized loading interface compatible with the preprocessed numpy outputs typically generated in SPES-based epilepsy research pipelines.
The expected input format is: root/ ├── train_X_stim.npy # [N_train, C, T] ├── train_y.npy # [N_train] ├── val_X_stim.npy ├── val_y.npy ├── test_X_stim.npy └── test_y.npy
Each sample returns: {"X_stim": Tensor[C, T]}, label
Where label ∈ {0,1} corresponds to SOZ vs non-SOZ stimulation sites.
The PR also includes: Example usage script in examples/soz_stimulation_example.py Minimal test case validating dataset load behavior Documentation describing expected data format and reference to the original study
Usage: from pyhealth.datasets import SOZStimulationDataset ds = SOZStimulationDataset(root="data/soz_spes_processed", split="train") x, y = ds[0] print(x["X_stim"].shape, y)