datahub
datahub copied to clipboard
News Release Studies
Cancer studies updated in this pull request:
Study_Id | Testing Instance Link | Sample Count |
---|---|---|
difg_msk_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=difg_msk_2023 | 73 Samples |
cll_broad_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=cll_broad_2022 | 1154 Samples |
msk_spectrum_tme_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=msk_spectrum_tme_2022 | 82 Samples |
sarcoma_msk_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=sarcoma_msk_2022 | 7494 Samples |
mbn_sfu_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=mbn_sfu_2023 | 297 Samples |
prad_organoids_msk_2022 | https://triage.cbioportal.mskcc.org/study/summary?id=prad_organoids_msk_2022 (Data is in impact repo) | 47 Samples |
thyroid_lhsc_2024 | https://triage.cbioportal.mskcc.org/study/summary?id=thyroid_lhsc_2024 | 190 Samples |
coadread_cass_2020 | https://triage.cbioportal.mskcc.org/study/summary?id=coadread_cass_2020 | 146 Samples |
prad_msk_mdanderson_2023 | https://triage.cbioportal.mskcc.org/study/summary?id=prad_msk_mdanderson_2023 (Data is in private repo) | 88 Samples |
Total | 9571 |
Note: Holding off the normal studies for this news release as Leonie wanted to release them all at once. sarcoma_msk_2022: CNA pending from author cll_broad_2022: mutsig missing data pending from author. mbn_sfu_2023: BL and DLBCL were aligned to different genome builds, but variants are reported in the same hg38 projection. coadread_cass_2020: protein expression, cna, methylation pending from authors.
*Missing data will be added as soon as authors respond.
Thanks for the updates @Rima-Waleed, @BabyASatravada! A few questions;
prad_organoids_msk_2022:
- Cohort size: 40 models were analyzed in the paper (see the attached sheet) prad_organoids.txt
- We are missing one cell line C4-2 from the study. - No data for that cell line. Confirmed from author.
- 1 organoid
MSKPCa4(MSKEF1)
from the NEPC group is included in the cohort. The other 4PARCB1, PARCB3, PARCB6, PARCB8
are missing from the cohort. They seem to have been sequenced for RNA. Any reason?
- Sample Type: There are 22 organoids, 6 PDX and 12 cell lines. And 12 organoids have a matching tumor sample. The sample types attribute doesn't reflect that. Can we correct the sample types as listed in the attached sheet? -DONE
- Sequencing Type/panel: The samples have been through IMPACT/WES sequencing. See the attached sheet. We need to update the attribute accordingly. -DONE
- RNA-seq: Can we mention in the rna meta description that the RNA-seq data is available only for Organoid samples? The cell lines, PDX's also have RNA-seq data maybe we should collect them? REQUESTED THE AUTHOR
- Can we double check the cancer type? These are not all adenocarcinomas. -DONE
- The study description needs to be updated to include WES. And update the study name to
Prostate Cancer (MSK, Science 2022)
-DONE - Missing matched normal status. -DONE
- Can we add a README to how the data was collected and transformed? - DONE
thyroid_lhsc_2024:
1.lhsc
should be capitals in study name.
2. It should be Whole-exome or
whole genome sequencing in the description as there is no overlap per paper and can we expand the cancer types (in ATC and co-occurring DTC samples).
3. How did we get to the 158p/190s cohort size for portal? There are more samples per paper
Seq Type | Samples after filtering |
---|---|
WES | 280 |
WGS | 18 |
CNA | 259 |
AmpliSEQ | 54 |
mRNA | 24 |
- We should follow the patient and sample Ids listed in Supp Table 1A. The Sample Ids in the study are a mix of pat/sample ids from the supp table. Any reason?
- Can we add a README to how the data was collected and transformed? And add to the description that only ATC and co-DTC samples are displayed in the cohort.
- Can we collect RNA-seq data from authors?
coadread_cass_2020:
- Can we remove
Metastatic
from the study name? since this cohort has a mix of CRC and mCRC cases. - Can we update the description to something like
Whole-exome sequencing of 146 colorectal tumor/normal pairs from a chinese cohort, covering 70 metastatic and 76 non-metastatic colorectal cancer patients.
- We might to double check the mutation data? Few samples have lots of splice site variants. Ex. https://triage.cbioportal.mskcc.org/patient?studyId=coadread_cass_2020&caseId=CCRC-0212
- Is the MAF we have a filtered version? Cross checking to Table S1 (Somatic mutations in mCRC sheet), there are a lot of variants per sample in the portal.
- CA19-9 Antigen & CEA Biomarker should be numbers.
- Can progression time & follow up time charts be converted to timeline? Do we have an anchor date?
- Will be nice to add proteome and phosphoproteome data at some point.
prad_msk_mdanderson_2023:
- The authors have mentioned in the paper that the data is available at https://www.cbioportal.org/study/summary?id=mixed_msk_mdanderson_2023. So we will have to go with this link. - The authors have corrected it.
- Can we update the description to
Targeted and whole-genome sequencing of 44 MD Anderson Prostate Cancer PDX models derived from 38 patients with tumor
. - Done - How did we do the diploid expression z-scores? I think we should remove that as its not in use. - removed
- Is the RNA data for T200 samples duplicated? I don't think thats is correct as the zscores would be affected. - Done
https://www.cbioportal.org/study/summary?id=prad_msk_mdanderson_2023
- [x] Title needs the publication.
- [x] description has typos
- [ ] 44 PDX models but we say 88 samples