datahub-study-curation-tools icon indicating copy to clipboard operation
datahub-study-curation-tools copied to clipboard

Upgrade scripts used for clinical data ingestion to python3

Open rxu17 opened this issue 4 months ago • 0 comments

Problem:

For the iAtlas to cBioportal project, we need to use python3 to run our processing pipeline for the clinical and maf datasets. We use some of the scripts here in the datahub-study-curation-tools repo to help with the processing so we are not rewriting code on our end namely:

  • oncotree mapping (to map to CANCER_TYPE and CANCER_TYPE_DESCRIPTION) using our clinical files' ONCOTREE_CODE values
  • add clinical header (this is the required format for cbioportal ingestion for clinical files)
  • generate metadata files (required files for cbioportal ingestion for clinical files)
  • generate caselists (required files for cbioportal ingestion for clinical files)

But these scripts use python 2.

Solution:

Here we add changes to port from python 2 to python 3 to be able to use these scripts in our pipeline.

Main changes are the following:

Testing:

  • Tested on the iatlas data to cbioportal project, and results were successfully ingested into cbioportal and validated

rxu17 avatar Aug 27 '25 06:08 rxu17