datahub icon indicating copy to clipboard operation
datahub copied to clipboard

February 2024 public release studies

Open Rima-Waleed opened this issue 1 year ago • 1 comments

Cancer studies updated in this pull request: *All studies had the first round of reviews

  • cll_broad_2022 (https://triage.cbioportal.mskcc.org/study/summary?id=cll_broad_2022)
  • mbn_sfu_2023 (https://triage.cbioportal.mskcc.org/study/summary?id=mbn_sfu_2023)
  • brain_ccma_2023 (https://triage.cbioportal.mskcc.org/study/summary?id=brain_ccma_2023) _*Working on adding rna seq data
  • sarcoma_msk_2022 (https://triage.cbioportal.mskcc.org/study/summary?id=sarcoma_msk_2022)
  • ccle_broad_2023 (https://triage.cbioportal.mskcc.org/study/summary?id=ccle_broad_2023)
  • difg_msk_2023 (https://triage.cbioportal.mskcc.org/study/summary?id=difg_msk_2023)
  • msk_spectrum_tme_2022 (https://triage.cbioportal.mskcc.org/study/summary?id=msk_spectrum_tme_2022)

Rima-Waleed avatar Feb 07 '24 17:02 Rima-Waleed

Thanks for putting this together @Rima-Waleed @sbabyanusha! I think we can enhance the data a bit more. See below

cll_broad_2022:

  • [x] 1075 samples went through WES or WGS seq - The sequenced case list shows only 984 samples. The data_mutations file has 1034 samples (WES variants + only WGS driver variants). I think the WGS part is not complete can we double check this?
    Removed WGS driver variants
  • [x] Can we also add a z-score file and a case-list for RNA-seq data?
  • [x] Mutsig data is available in Supp Table 4a. Missing number of bases covered (N/ Nnon). Data pending from author.
  • [x] We can add the Focal/Arm level CNA data from Supp Table 7b in Generic Assay format.
  • [x] IGLV3-21 R110 mutation status etc., from Supp Table 8a & U1 Status from Supp Table 3e can be added to clinical for Oncoprint. See Fig 1c.
  • [x] Multiple events are clubbed to one Sample Acquisition timepoint for timeline. This can be split to Diagnosis, Prior Treatment, Sample Acquisition, Deceased events.
  • [x] Is the t0 Sample Acquisition for treatment track?

mbn_sfu_2023:

  • [x] Can we remove "newly sequenced" from the description and "297 pediatric burrkitt lymphoma (BL)" be corrected as the cohort has adult BLs too?
  • [x] Per the paper, 230 samples were analyzed for SVs. Can the SV case list count be fixed? Supplements don't indicate which samples were analyzed, only those having MYC rearrangements. Emailed author for confirmation.
  • [x] Can the SV description be updated to indicate only MYC rearrangements were analyzed.
  • [x] RNA-seq data is not showing up in the Genomic Profiles table? Can we also add a RNA case list?
  • [x] Subgroup from Supp Table 12 can be added to clinical.
  • [x] Same build was used for both BL and DLBCL's? And for both MAF and RNA-seq? Confirmed GRCh38 is the correct build. GN annotations (protein change) match the paper's annotations.

brain_ccma_2023:

  • [x] Can we update the description to: "Whole-genome and transcriptome sequencing of 182 cell lines derived from pediatric brain cancer and sarcomas. Data is available through the Childhood Cancer Model Atlas data portal."
  • [x] There are multiple rows per gene in the Methylation file - due to multiple probes. How are we handling this? Should we prolly average the values across the probes? Added methylation probes
  • [x] Can we also add a methylation, RNA-seq case list? Methylation case list added. RNA-seq data (correct format) pending from author.
  • [x] Can we use CCMA_CNVcallings.csv table to convert to CNA format? Correct format pending from author.
  • [x] Many samples are missing the Cancer types/oncotree codes. I don't see any related info on CCMA. Can we get that info from the authors? No malignancy samples

sarcoma_msk_2022:

  • [x] Update the journal in study name to "(MSK, Nat Commun 2022)"
  • [x] The panel info is missing. Table S1 has the gene list. Gene panel sent to Rob for import.
  • [x] Many samples are missing Oncotree Code/cancer type info. The counts doesn't match for a few types, see Fig1. No malignancy samples don't have an Oncotree code, and samples that don't have oncotree codes for their cancer type were grouped with the closest type, ex. Sarcoma, unclassified and Sarcoma, NOS.
  • [x] Do we have details about the types of events included in the data_sv.txt file? It would be beneficial to include the Event_Info column. Data not available.
  • [x] Paper mentions "13,239 copy number alterations" were observed. The study is missing copy number data. Data pending from author.
  • [x] "" in clinical data.

ccle_broad_2023: --> Holding off for February release

  • [x] Remove "For more info about the proteomics data, see the README." from description? There seems to be no reference to protein data in readme.
  • [x] The Pubmed link refers to 2019 publication. Is that right?
  • [x] Oncotree code/Cancer type missing for a few samples. Histology information seems to be complete, can this be used to infer Oncotree codes? Those have histology: immortalized cells and fibroblasts (non-cancerous)
  • [x] "" in clinical data
  • [ ] All sample case list count doesn't match with the cohort size. Were all the samples sequenced for Mutations, CNA and SV?
  • [ ] The FGA is close to 1 for almost all the samples 🧐
  • [x] We can add SAMPLE_CLASS (Cell Line) attribute to clinical.
  • [x] Are the values in data_rna_seq_v2_mrna.txt log2(TPM+1)? Can the meta data be updated to include the expression data type and the file name be changed according to the recommended staging filenames?

difg_msk_2023:

  • [x] Can "Miscellaneous Neuroepithelial Tumor" Oncotree code/CT/CTD be updated to Glioma?

msk_spectrum_tme_2022:

  • [x] Matched normal status in description
  • [x] Myriad GIS Score can be number.
  • [x] What is Patient Isabl ID?
  • [x] Can we add the WGS samples to panel matrix file?

rmadupuri avatar Feb 12 '24 18:02 rmadupuri