datahub
datahub copied to clipboard
Soft tissue and bone sarcoma public study
checks
For all pull requests:
- [ ] Passes validation
For a new study (in addition to above):
- [ ] Does study name and study ID follow our convention? e.g. Tumor_Type (Institue, Journal Year); brca_mskcc_2015
- [ ] is study meta data complete? e.g. pmid, group of PUBLIC
- [ ] were all samples profiled with WES/WGS? If not, is gene panel file curated?
- [ ] are oncotree codes of all samples curated; Cancer Type and Cancer Type Detailed needs to be added in addition to Oncotree Code
- [ ] clinical sample and patient data with meta files
- [ ] mutations data with meta files
- [ ] MAF is based on hg19
- [ ] MAF with 2 isoforms: uniprot and mskcc
- [ ] CNA data with meta files
- [ ] CNA segment data with meta files
- [ ] Expression data including z-scores with meta files
- [ ] Case-lists for all profiles.
- [ ] Manual checking (Niki or JJ): Triage or private Portal link here
- [x] Study ID should not be called mixed, change to sarcoma_...
- [x] There is no Cancer Type or Cancer Type Detailed. If not in the paper, use Final Dx and add that in the description of OncoTree and CT and CTD
- [x] All clinical fields are not properly curated format-wise. They are lowercase.
- [x] Why is Age at Dx NA for all patients?
- [x] Your meta description is not right, its not following our guidelines. Please adhere to the documentation when creating the metadata.
- [x] The paper says Through targeted panel sequencing of 7494 sarcomas representing 44 histologies, the description and sample count say the same but your case list just says 71%. Why are the numbers not in sync? A total of 28,546 known or likely pathogenic variants (11,536 non-synonymous single nucleotide variants [SNVs]/indels, 13,239 copy number alterations, and 3771 rearrangements) were detected. No known or likely pathogenic alterations were detected in 226 (3.0%) samples using this gene panel.
- [x] We need to change the IDs, sample IDs as just number is not right. Add the study ID as a prefix and then '-' samples ID.
- [x] Same for patient ID
- [x] No CN and SV? SV added, CN pending from author