PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add dataset SIIM-ISIC

Open andyroo123 opened this issue 1 month ago • 0 comments

Contributor: Andrew Kolarits ([email protected])

Contribution Type: New Dataset (SIIM-ISIC) + DICOM Image Processing Utilities

Summary:

This PR adds full support for the SIIM-ISIC Melanoma Classification Dataset, including metadata loading, DICOM image handling, and a preprocessing pipeline to prepare dermoscopic images for machine learning models. The SIIM-ISIC dataset contains clinical metadata and dermoscopic images used for melanoma detection, lesion classification, and multimodal learning.

SIIM-ISIC data is publicly available: ISIC Archive: https://challenge2020.isic-archive.com Kaggle: https://www.kaggle.com/c/siim-isic-melanoma-classification

What’s Included Dataset:

  • Dataset class that inherits from BaseDataset
  • YAML configuration file for data loading
  • Test suite with all tests passing
  • Documentation following PyHealth standards

Files Added / Modified: Dataset pyhealth/datasets/siim_isic.py pyhealth/datasets/configs/siim_isic.yaml

Documentation docs/api/datasets/pyhealth.datasets.SIIMISICDataset.rst

Tests tests/core/test_siim_isic.py Synthetic test files under test-resources/core/siim_isic/

Testing Status All dataset + preprocessing tests pass successfully:

  • DICOM reading + normalization
  • Patient lookup
  • Metadata consistency
  • Dataset statistics
  • End-to-end sample retrieval

andyroo123 avatar Dec 06 '25 20:12 andyroo123