datahub
datahub copied to clipboard
new study: ccrcc_wcm_2022
What?
Added new clear cell renal carcinoma (ccRCC) study. The paper is accepted and the PMID and citation are available.
checks
For all pull requests:
- [x] Passes validation
For a new study (in addition to above):
- [x] Does study name and study ID follow our convention? e.g. Tumor_Type (Institue, Journal Year); brca_mskcc_2015
- [x] is study meta data complete? e.g. pmid, group of PUBLIC
- [x] were all samples profiled with WES/WGS? If not, is gene panel file curated?
- [ ] are oncotree codes of all samples curated; Cancer Type and Cancer Type Detailed needs to be added in addition to Oncotree Code
- [x] clinical sample and patient data with meta files
- [x] mutations data with meta files
- [x] MAF is based on hg19
- [ ] MAF with 2 isoforms: uniprot and mskcc
- [x] CNA data with meta files
- [ ] CNA segment data with meta files
- [ ] Expression data including z-scores with meta files
- [x] Case-lists for all profiles.
- [ ] Manual checking (Niki or JJ): Triage or private Portal link here
Hi @alexsigaras, thank you for the PR. The data formats look good! Some fixes are required before we can merge,
- Can you add the Oncotree Codes (http://oncotree.mskcc.org), cancer Type and Cancer Type Detailed columns to the clinical sample file? You can use the script here to fill in CT and CTD columns based on oncotree codes - https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/oncotree-code-converter
- Does all the samples have a matched normal? Can this info be added to clinical file as well? The attribute we use is SOMATIC_STATUS (Matched/Unmatched).
- The paper mentions 68 samples from 44 patients. And the data here is for 67 samples from 43 patients. Can you add to the description on why one patient was excluded in the portal?
- Is the MAF annotated with Genome Nexus?
- Can you rename the
meta_mutation_extended.txt
file tometa_mutations.txt
- A few case lists are missing. Can you generate them using the script here - https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-case-lists
Thanks!
Testing instance of the study: https://triage.cbioportal.mskcc.org/study/summary?id=ccrcc_wcm_2022
Thanks @rmadupuri - We will take care of the remaining items and update this PR for final review.
@rmadupuri: We have updated the cohort based on the feedback requested and have also updated the branch to match master. Could you please take another look and let us know if everything looks ok to merge?
Thanks, Alex
Thank you for the updates @alexsigaras. The fixes look good!
One additional issue that we noticed with the MAF - there are ~500 variants with either the position is off by 1 or the alleles are incorrect. The annotation failed on these variants and they would show up incorrectly in the portal. I am attaching the problematic variants here - data_mutations_unannotated.txt.
Could you double check these and give us the right coordinates/alleles? Let us know if any questions.
Thanks!
HI @alexsigaras just checking, do you have an update for the above issue? Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.