slivar
slivar copied to clipboard
VEP annotations other than impact?
Is there a way to pull other annotations from the CSQ field other than the built in INFO.impactful? I see that some of the flags need to be integers or flags, but I have a loftee annotations with "HC" or "LC" I'd like to filter on. (Yes, it could be turned into a flag, but there are other annotations less binary too)
I was thinking about bcftools view -i 'INFO/CSQ[70]=="HC"' but that didn't work, and I'm not sure it would capture the whole annotation if there were multiple transcripts. Happy to write my own work around, but figured I'd double check with you first! Thanks!
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|GENE_PHENO|NEAREST|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|LoF|LoF_filter|LoF_flags|LoF_info|existing_InFrame_oORFs|existing_OutOfFrame_oORFs|existing_uORFs|five_prime_UTR_variant_annotation|five_prime_UTR_variant_consequence|REVEL|SpliceRegion|AF_gnomadall|ALT|AUG_eventAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_kozakAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_kozak|CADD_PHRED|CLNSIG|CLNSIGINCL|CLNVC|CLNVI|Delta_G4|Delta_dsRNA|FREQ_exomes|FREQ_genomes|FREQ_gnomadall|G4|GENEINFO|Literature_source|MC|NCboost_chr_rank_perc|PMID|REF|ReMM_probability|STOP_eventSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locvar_preceeding_start_locSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locref_preceeding_start_loc|TE|TE_log2fold|canonical|chr|distance|dsRNA|gene_name|pos|ref|strand|transcript|transcript_ver|type|var">
1 138852 . C T . . AC=1;AN=700;HPO_CT=1;GENE=LOC729737;MRNA=NR_039983.2;FXN=non-coding-exon;HGVS_CDNA=n.1395G>A;HGVS_PROT=.;ESP_AF=0;GNOMAD_AF=0.000106689;G1K_AF=0;CADD=27.2;nhet_aff=1;nhet_unaff=0;nhomalt_aff=0;nhomalt_unaff=0;nhet_female_aff=0;nhet_female_unaff=0;nhomalt_female_aff=0;nhomalt_female_unaff=0;nhet_male_aff=1;nhet_male_unaff=0;nhomalt_male_aff=0;nhomalt_male_unaff=0;maxAAF=0.00173611;bravo_AF=0.000615815;CSQ=T|stop_gained|HIGH|AL627309.1|ENSG00000237683|Transcript|ENST00000423372|protein_coding|1/2||ENST00000423372.3:c.458G>A|ENSP00000473460.1:p.Trp153Ter|528/2661|458/780|153/259|W/*|tGg/tAg|rs540391832||-1||SNV|Clone_based_ensembl_gene||YES||||ENSP00000473460||R4GN28&B7Z7W4|UPI0002C88512||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||HC|||GERP_DIST:0&BP_DIST:1223&PERCENTILE:0.587179487179487179&DIST_FROM_LAST_EXON:374&50_BP_RULE:PASS&PHYLOCSF_TOO_SHORT|||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|CICP27|ENSG00000233750|Transcript|ENST00000442987|processed_pseudogene||||||||||rs540391832|4016|1||SNV|HGNC|48835|YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000484859|antisense||||||||||rs540391832|2622|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000490997|antisense||||||||||rs540391832|3956|-1||SNV|Clone_based_vega_gene|||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.14|ENSG00000239906|Transcript|ENST00000493797|antisense||||||||||rs540391832|938|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|Transcript|ENST00000494149|processed_pseudogene||||||||||rs540391832|2957|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.16|ENSG00000269981|Transcript|ENST00000595919|processed_pseudogene||||||||||rs540391832|887|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000249765|CTCF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000918299|TF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||;SpliceAI=T|AL627309.1|0.00|0.00|0.00|0.00|-2|-14|-2|-5
For now, I think it's better to do this type of thing with vembrane from @tedil as it can do this directly already.
For slivar, it's not yet implemented. This is more than trivial in part because CSQ is actually a nested array--an array for each transcript. So you'd need something like (this is not implemented, just brainstorming):
INFO.CSQ[0].LoF == "HC" || INFO.CSQ[1].LoF == "HC"
there's not information about what's a string and what's not, so for numeric, you'd still need to do:
parseFloat(INFO.CSQ[0].gnomAD_AMR_AF) < 0.01 ...
but,
Thank you, Brent! I'll look into it!