datahub
datahub copied to clipboard
Public Release Studies: February 2024
Fix #2005
Cancer studies updated in this pull request:
Study_Id | Testing Instance Link | Sample Count |
---|---|---|
difg_msk_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=difg_msk_2023 | 73 Samples |
cll_broad_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=cll_broad_2022 | 1154 Samples |
brain_ccma_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=brain_ccma_2023 | 182 Samples |
msk_spectrum_tme_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=msk_spectrum_tme_2022 | 82 Samples |
sarcoma_msk_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=sarcoma_msk_2022 | 7494 Samples |
mbn_sfu_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=mbn_sfu_2023 | 297 Samples |
Total | 9282 |
NOTE:
sarcoma_msk_2022
- CNA pending from author
brain_ccma_2023
- RNA-seq & CNA pending from author
cll_broad_2022
- mutsig missing data pending from author.
difg_msk_2023 & msk_spectrum_tme_2022
- Study has been updated and merged to master branch.
*Missing data will be added as soon as authors respond.
Thanks for the updates Rima! A few more issues that I noticed.
brain_ccma
- [x] The sequenced case list is not reflecting the WGS count(149) from paper.
- [x] There are 198k rows with just probe names and empty values in the file. Is the Methylation data formatted correctly?
- [x] RNA-seq data is available on the CCMA portal and can be converted to the matrix format. Pending from authod by EOW.
- [x] There's also drug screening data. We can add this in Generic Assay format. Pending from authod by EOW.
- [x] The paper states only 2 normal tissue primary cell lines. There's 38 in the portal. Can we double check? The paper mentions 2 normal tissue primary cells in the atlas, however there are 38 non-malignant cell lines .
- [x] Some mutations are missing in the portal? A few samples show up as unprofiled in the portal although the paper shows the variants. See the oncoprint in Figure 1B of the paper. Ex. the Osteosarcoma type. Also the mutation type in the paper and portal are a bit different. Reached out to authors as I found more discrepancies in mutation data.
We will have to hold off the CCMA study for a closer review.
cll_broad
- [x] We can remove the WGS samples (91) from the cohort since the genomic data is not available.
- [x] Although WGS data is missing, the other gene percentages are close to the Fig.1 but NOTCH1 is very low. 5% in portal whereas ~12% in paper. Can we double check?
- [x] We can add the focal events as generic assay. Data is in Supp tables. See the Oncoprint in Fig.1
- [x] Gene panel can be added based on the Sequencing Type attr.
sarcoma_msk_2022
- [x] Onoctree codes need to be updated based on the 'final diagnosis' from the supp table. The samples with NA code have a malignant cancer type.
sarcoma_msk_2022:
- [x] Description does not include normals No matched normal tissue
- [x] What was used to define OncoTree codes? Orignal or corrected. Whichever was used should be defined in the description of the meta fields. Final diagnosis was used, added to meta description
- [x] Any idea why the counts in figure 1a don't match our cancer counts? Some final diagnoses don't have OncoTree codes and so were grouped with similar cancer types (ex. Osteosarcoma (OS) given to Extraskeletal Osteosarcoma, Sarcoma, NOS (SARCNOS) given to Sarcoma, Unclassified, and Round Cell Sarcoma, NOS (RCSNOS) given to Undifferentiated Round Cell Sarcoma/Ewing-Like)
- [x] For tumor purity, add computational tumor purity to the description.
- [x] Might be an ask, but is it correct to have Sarcoma_ as an ID or sarcoma_ or SARCOMA_ Changed to sarcoma_
- [x] Can we confirm that no mutations were filtered out in the paper? Not mentioned, only variants of unknown significance were excluded from analyses.