datahub
datahub copied to clipboard
Adding Burkitt Lymphoma study
checks
For all pull requests:
- [ ] Passes validation
For a new study (in addition to above):
- [ ] Does study name and study ID follow our convention? e.g. Tumor_Type (Institue, Journal Year); brca_mskcc_2015
- [ ] is study meta data complete? e.g. pmid, group of PUBLIC
- [ ] were all samples profiled with WES/WGS? If not, is gene panel file curated?
- [ ] are oncotree codes of all samples curated; Cancer Type and Cancer Type Detailed needs to be added in addition to Oncotree Code
- [ ] clinical sample and patient data with meta files
- [ ] mutations data with meta files
- [ ] MAF is based on hg19
- [ ] MAF with 2 isoforms: uniprot and mskcc
- [ ] CNA data with meta files
- [ ] CNA segment data with meta files
- [ ] Expression data including z-scores with meta files
- [ ] Case-lists for all profiles.
- [ ] Manual checking (Niki or JJ): Triage or private Portal link here
- [x] "" in the file
- [x] If there are no SVs in the genomic profiles sample count, then why is the SV table showing up? Authors approach for SVs only provides MYC rearrangements to IGH, IGK, IGL, BCL6- this should show up in genomic profiles samples count & SV table?
- [x] paper says WGS and RNA-sequencing reads were aligned to GRCh38 for BL and GRCh37 for DLBCL. What is the build of the entire cohort? The reference build of the entire cohort is GRCh38 as described in the supplemental methods that sequencing read alignment "was performed as previously described in detail1" referencing to the 2019 BL publication where reads were aligned to GRCh38. MAF supplemental file S6 also shows NCBI build as GRCh38.
- [x] Are the mutation counts really that high? Is that normal in BL and DLBCL? Can we confirm about matched normals? Literature search shows that Epstein-Barr virus (associated with both BL and DBCL) is associated with high mutation counts. Matched normal (somatic status) added to clinical sample file.
- [x] No RNA-Seq data? Not provided in supplemental files, emailed author.
- [x] Can we check the sample IDs again? Some look like names Double checked sample IDs; some are names as per supplementary file S1
- [x] Can we make KM plots for PFS as well?
- [x] The sample count in the paper and portal don't match
- [x] Same reason maybe but table 1 numbers don't match what we have The (297) number in the portal reflects the newly sequenced samples without any external previously published cohort, so will be accurate to represent what was actually newly sequenced in this study. "The number 230 mentioned in the introduction and visual abstarct comes from the count of those BL patients that have WGS data available, so can be used in the calling and analysis of somatic mutations. This count does not include BL cell lines (N=22) or those patients that only had RNA-Seq data available (no WGS). These patients with RNA-Seq available are included in the overall cohort and therefore the number in Table 1 is larger than what is in Supplemental Table 1. Both Supplemental Table 1 and Supplemental table 2 only report the newly sequenced data, of which there are 43 newly sequenced DLBCLs and the remaining are the DLBCLs from previously published studies.
- [x] How are we distinguishing adult and pediatric? Age_category added in the patient file: Categorized based on St Jude hospital cut-off value for Pediatric <= 20 years
- [x] Should we rename it to also include DLBCL? Renamed to mbn_sfu_2023 to include DLBCL and BL as Mature B-Cell Neoplasms.
- [x] NO CNA data num.mark is not a standard output from the copy number callers authors use routinely (battenberg and controlfreec)
- [x] Can we add mutational signatures as generic assay?
- [x] Paper is not mentioned in the title
- [x] I am seeing OS status twice in the clinical file. Is that a bug or data issue? maybe bug, I don't see it on my mind
- [x] "Comprehensive whole-genome sequencing of 92 adult and 138 pediatric burrkitt lymphoma (BL) and diffuse large B-cell lymphoma (DLBCL) patient tumors" does not sum up to 297. 230 patient tumors (92+138) includes samples with WGS data only (and not samples with only RNA-seq data). Double checked with author who confirmed 297 would reflect the newly sequenced samples without any external previously published cohort, so will be accurate to represent what was actually newly sequenced in this study._
I see this is ticked Can we add mutational signatures as generic assay? I cant see them