slivar icon indicating copy to clipboard operation
slivar copied to clipboard

VEP annotations other than impact?

Open karynne7 opened this issue 1 year ago • 13 comments

Is there a way to pull other annotations from the CSQ field other than the built in INFO.impactful? I see that some of the flags need to be integers or flags, but I have a loftee annotations with "HC" or "LC" I'd like to filter on. (Yes, it could be turned into a flag, but there are other annotations less binary too)

I was thinking about bcftools view -i 'INFO/CSQ[70]=="HC"' but that didn't work, and I'm not sure it would capture the whole annotation if there were multiple transcripts. Happy to write my own work around, but figured I'd double check with you first! Thanks!

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|GENE_PHENO|NEAREST|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|LoF|LoF_filter|LoF_flags|LoF_info|existing_InFrame_oORFs|existing_OutOfFrame_oORFs|existing_uORFs|five_prime_UTR_variant_annotation|five_prime_UTR_variant_consequence|REVEL|SpliceRegion|AF_gnomadall|ALT|AUG_eventAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_kozakAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_kozak|CADD_PHRED|CLNSIG|CLNSIGINCL|CLNVC|CLNVI|Delta_G4|Delta_dsRNA|FREQ_exomes|FREQ_genomes|FREQ_gnomadall|G4|GENEINFO|Literature_source|MC|NCboost_chr_rank_perc|PMID|REF|ReMM_probability|STOP_eventSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locvar_preceeding_start_locSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locref_preceeding_start_loc|TE|TE_log2fold|canonical|chr|distance|dsRNA|gene_name|pos|ref|strand|transcript|transcript_ver|type|var">
1	138852	.	C	T	.	.	AC=1;AN=700;HPO_CT=1;GENE=LOC729737;MRNA=NR_039983.2;FXN=non-coding-exon;HGVS_CDNA=n.1395G>A;HGVS_PROT=.;ESP_AF=0;GNOMAD_AF=0.000106689;G1K_AF=0;CADD=27.2;nhet_aff=1;nhet_unaff=0;nhomalt_aff=0;nhomalt_unaff=0;nhet_female_aff=0;nhet_female_unaff=0;nhomalt_female_aff=0;nhomalt_female_unaff=0;nhet_male_aff=1;nhet_male_unaff=0;nhomalt_male_aff=0;nhomalt_male_unaff=0;maxAAF=0.00173611;bravo_AF=0.000615815;CSQ=T|stop_gained|HIGH|AL627309.1|ENSG00000237683|Transcript|ENST00000423372|protein_coding|1/2||ENST00000423372.3:c.458G>A|ENSP00000473460.1:p.Trp153Ter|528/2661|458/780|153/259|W/*|tGg/tAg|rs540391832||-1||SNV|Clone_based_ensembl_gene||YES||||ENSP00000473460||R4GN28&B7Z7W4|UPI0002C88512||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||HC|||GERP_DIST:0&BP_DIST:1223&PERCENTILE:0.587179487179487179&DIST_FROM_LAST_EXON:374&50_BP_RULE:PASS&PHYLOCSF_TOO_SHORT|||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|CICP27|ENSG00000233750|Transcript|ENST00000442987|processed_pseudogene||||||||||rs540391832|4016|1||SNV|HGNC|48835|YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000484859|antisense||||||||||rs540391832|2622|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000490997|antisense||||||||||rs540391832|3956|-1||SNV|Clone_based_vega_gene|||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.14|ENSG00000239906|Transcript|ENST00000493797|antisense||||||||||rs540391832|938|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|Transcript|ENST00000494149|processed_pseudogene||||||||||rs540391832|2957|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.16|ENSG00000269981|Transcript|ENST00000595919|processed_pseudogene||||||||||rs540391832|887|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000249765|CTCF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000918299|TF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||;SpliceAI=T|AL627309.1|0.00|0.00|0.00|0.00|-2|-14|-2|-5

karynne7 avatar Sep 28 '22 22:09 karynne7

For now, I think it's better to do this type of thing with vembrane from @tedil as it can do this directly already.

For slivar, it's not yet implemented. This is more than trivial in part because CSQ is actually a nested array--an array for each transcript. So you'd need something like (this is not implemented, just brainstorming):

INFO.CSQ[0].LoF == "HC" || INFO.CSQ[1].LoF == "HC"

there's not information about what's a string and what's not, so for numeric, you'd still need to do:

parseFloat(INFO.CSQ[0].gnomAD_AMR_AF) < 0.01 ...

but,

brentp avatar Sep 28 '22 22:09 brentp

Thank you, Brent! I'll look into it!

karynne7 avatar Sep 28 '22 22:09 karynne7