PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

add-tcga-paad-dataset-sadiq5

Open mustafa-sadiq opened this issue 1 month ago • 0 comments

Add TCGA-PAAD Dataset (Pancreatic Adenocarcinoma)

  • Introduces TCGA PAAD Dataset for TCGA-PAAD with standardized mutations and clinical tables.
  • Config: tcga_paad.yaml defines two tables: mutations (hugo_symbol, variant_classification, variant_type, hgvsc, hgvsp, tumor_sample_barcode) and clinical (age_at_diagnosis, vital_status, days_to_death, tumor_stage).
  • Usage: from pyhealth.datasets import TCGAPAADDataset dataset = TCGAPAADDataset(root="path/to/TCGA-PAAD") samples = dataset.set_task() with default CancerSurvivalPrediction()

Testing

python -m pytest tests/core/test_tcga_paad.py -q

PS C:\Users\musta\OneDrive\UIUC MS CS 2024\CS 598 - Deep Learning for Healthcare\Project\PyHealth> python -m pytest tests/core/test_tcga_paad.py -q ....... [100%] 7 passed in 5.29s

mustafa-sadiq avatar Dec 08 '25 03:12 mustafa-sadiq