datahub icon indicating copy to clipboard operation
datahub copied to clipboard

mutation_type data issue

Open jjgao opened this issue 6 years ago • 8 comments

@jjgao commented on Wed Apr 12 2017

There are some non-standard mutation types in the public portal database: data-issue.xlsx For example, nsclc_tcga_broad_2016 has no genomic locations.

Can we fix the data?

Below is the code to pull out the spreadsheet above.

select mutation_type, count(*)
from mutation_event
group by mutation_type;

select gp.stable_id, chr, start_position, end_position, reference_allele, tumor_seq_allele, protein_change, mutation_type
from mutation_event me, mutation m, genetic_profile gp
where me.mutation_event_id = m.mutation_event_id
and m.genetic_profile_id=gp.genetic_profile_id
and me.mutation_type in ('splice','Missense','Indel', 'NA', 'InFrameIns', 'Splice_site_SNP');

jjgao avatar Jun 28 '18 20:06 jjgao

Here is an updated list:

mutation_type count(*) Start_Codon_SNP 1 nonframeshift.deletion 1 vIII deletion 1 Exon skipping 1 exon14skip 1 Splice_Site_SNP 1 COMPLEX_INDEL 1 nonframeshift 1 Essential splice 2 ESSENTIAL_SPLICE_SITE 2 frameshift-deletion 2 frameshift_deletion 2 lincRNA 3 nonframeshift-deletion 4 Nonsense 4 Silent 5 Indel 6 frameshift 6 In frame INDEL 7 Splicing Site Mutation 8 NON_SYNONYMOUS_CODING 14 stopgain 15 Frame Shift INDEL 19 FRAMESHIFT_CODING 27 splice 28 Missense 29 Frame_Shift 32 3UTR 40 non-coding-exon 51 5UTR 51 5'Flank 53 coding 60 nonsynonymous 92 5Flank 121 NA 782 Nonstop_Mutation 2950 In_Frame_Ins 3125 Translation_Start_Site 3780 Targeted_Region 6518 In_Frame_Del 17289 Frame_Shift_Ins 43871 Fusion 50049 Splice_Region 63858 Splice_Site 70971 Frame_Shift_Del 100550 Nonsense_Mutation 184773 Missense_Mutation 2469774

jjgao avatar Jun 28 '18 20:06 jjgao

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 10 '20 03:08 stale[bot]

@jjgao updated list:

5'Flank 74 (acc_tcga, included in mutated issue) Frame_Shift 1 (msk_impact_2017, ???) Frame_Shift_Del 147813 Frame_Shift_Ins 62016 Fusion 67463 In_Frame_Del 26211 In_Frame_Ins 6440 Missense_Mutation 3476025 NA 8 (hcc_inserm_fr_2015, cll_iuopa_2015, included in mutated issue) Nonsense_Mutation 243791 Nonstop_Mutation 4084 p.Q274L 1 (cesc_tcga, included in mutated issue) Splice_Region 89463 Splice_Site 100474 Targeted_Region 177 Translation_Start_Site 5856

  • The ones included in the mutated issue will be fixed once we have the correct alleles and reannotated.
  • The 1 variant in msk_impact_2017 appear to be a deletion. But re-annotation changed the alleles and also made it an insertion...this is one known annotator issue that we have with other studies in the mutated issue list. Details attached at the bottom.
  • Are we okay with the rest of the list?

msk_impact_2017 BEFORE |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   TP53 |   |   | GRCh37 | 17 | 7578456 | 7578481 | + | frameshift_variant | Frame_Shift | DEL | GCGGACGCGGGTGCCGGGCGGGGGTG | GCGGACGCGGGTGCCGGGCGGGGGTGT | - |   |   | P-0007051-T01-IM5 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 255 | 312 |   |   | ENST00000269305.4:c.449_474del | p.Thr150SerfsTer22 | p.T150Sfs22 | ENST00000269305 | NM_001126112.2 | 150 | aCACCCCCGCCCGGCACCCGCGTCCGC/a | 0 |   |   |   |   After |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   TP53 | 7157 |   | GRCh37 | 17 | 7578481 | 7578482 | + | frameshift_variant | Frame_Shift_Ins | INS | - | GCGGACGCGGGTGCCGGGCGGGGGTGT | T |   |   | P-0007051-T01-IM5 |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 255 | 312 |   |   | ENST00000269305.4:c.448dup | p.Thr150AsnfsTer31 | p.T150Nfs31 | ENST00000269305 | NM_001126112.2 | 150 | aca/aAca | 0 |   |   |   |  

yichaoS avatar Sep 01 '20 21:09 yichaoS

This looks much better! Thanks, @yichaoS. I am wondering how's our private databases look like?

jjgao avatar Sep 02 '20 17:09 jjgao

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 02 '21 03:03 stale[bot]

Ran the query in private DB just now and the results are as below: (the number is # of mutations w/ this issue)

29 accrf_merged_unfiltered_mutations 29 acyc_fmi_morris_unfiltered_mutations 64 all_phase2_target_2017_mutations 315 aml_target_2017_mutations 1 blca_cmo_solitd_5931_b_mutations 1 brca_mskcc_solitd_5065_g_mutations 1 brca_sanger_2012_mutations 1 brca_sanger_2016_mutations 3 breast_msk_2017_mutations 1 cellline_cclp_sanger_mutations 20 cgci_blgsp_2017_mutations 2 coadread_cbe_saltzl_5281_mi_mutations 2 coadread_saltzl_2015_mutations 22 egc_msk_janjigy_2017_mutations 101 gbm_yale_mutations 1 gct_tcga_freeze_mutations 1 lcll_13-079_mskcc_foundation_mutations 1 lihc_tcga_pub_mutations 3 lms_cbe_agaramn_4786_mutations 1 lymphoma_filtered_merged_mutations 1 lymphoma_merged_mutations 4 mbl_icgc_2014_mutations 14 mixed_13-060_mskcc_foundation_mutations 1 mixed_13-158_mskcc_foundation_mutations 3 mixed_dmp_MSK-IMPACT_2013_mutations 22 mixed_gray_proj_06208_mutations 1 mixed_lymphoma-sa_therp_filtered_mskcc_foundation_mutations 1 mixed_lymphoma-sa_therp_mskcc_foundation_mutations 30 mixed_lymph_mskcc_foundation_160830_mutations 2 mpn_cbe_rampalr_RR_mpnaml_mutations 64 mpn_mskcc_2015_mutations 2 nbl_target_2017_mutations 1 nsclc_msk_subset_2018_mutations 2 os_target_2017_mutations 1 ovt_levine_2015_mutations 2 prad_mskcc_foundation_mutations 2 prad_su2c_2018_prad_su2c_mutations 3 thca_mskcc_faginj_2014_mutations 3 thhc_mskcc_2015_chant_mutations 1 ulm_cbe_weigeltb_BW_lmbn_mutations 1 wt_target_2017_mutations

sbabyanusha avatar Apr 29 '21 17:04 sbabyanusha

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 26 '21 22:10 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 19 '22 06:06 stale[bot]