Add dataset SIIM-ISIC
Contributor: Andrew Kolarits ([email protected])
Contribution Type: New Dataset (SIIM-ISIC) + DICOM Image Processing Utilities
Summary:
This PR adds full support for the SIIM-ISIC Melanoma Classification Dataset, including metadata loading, DICOM image handling, and a preprocessing pipeline to prepare dermoscopic images for machine learning models. The SIIM-ISIC dataset contains clinical metadata and dermoscopic images used for melanoma detection, lesion classification, and multimodal learning.
SIIM-ISIC data is publicly available: ISIC Archive: https://challenge2020.isic-archive.com Kaggle: https://www.kaggle.com/c/siim-isic-melanoma-classification
What’s Included Dataset:
- Dataset class that inherits from BaseDataset
- YAML configuration file for data loading
- Test suite with all tests passing
- Documentation following PyHealth standards
Files Added / Modified:
Dataset
pyhealth/datasets/siim_isic.py
pyhealth/datasets/configs/siim_isic.yaml
Documentation
docs/api/datasets/pyhealth.datasets.SIIMISICDataset.rst
Tests
tests/core/test_siim_isic.py
Synthetic test files under test-resources/core/siim_isic/
Testing Status All dataset + preprocessing tests pass successfully:
- DICOM reading + normalization
- Patient lookup
- Metadata consistency
- Dataset statistics
- End-to-end sample retrieval