bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Filter a gene list including intergenic regions

Open ccbruels opened this issue 1 year ago • 3 comments

Hi,

I see how to filter a gene list for most snv/indels in issue Filter a gene list #1964.

However, I want to look at intergenic variants as well. Annovar includes other info in the Gene.refGene field like Gene.refGene=FAM138A\x3bOR4F5

If my gene.txt file only contains FAM138A, the intergenic variants are not included.

I'm using bcftools v1.21. My command is in the format bcftools view -i '[email protected] ' file.vcf

Including wildcards in the command or in the genes.txt file didn't work.

Do you have any suggestions?

ccbruels avatar Nov 12 '24 21:11 ccbruels

The problem is somewhat confusing as it is stated: you say you want to filter in intergenic regions but the example you gave seems unrelated. Instead, it seems the variant is in two overlapping genes (here FAM138A and OR4F5) and the problem is that matching by gene name does not work for these records. So I am unsure what is it you want?

pd3 avatar Nov 18 '24 14:11 pd3

Perhaps I picked a bad example, that variant was tagged as intergenic by annovar but I did not look at it in a genome browser.

Looking at another clearly intergenic variant, here is the annovar vcf output chr1 3439841 . A C 31.76 PASS P;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=PRDM16\x3bARHGEF16;GeneDetail.refGene=dist\x3d1220\x3bdist\x3d14824;ExonicFunc.refGene=.;AAChange.refGene=.;Xref.refGene=.;avsnp151=rs2483250;gnomad41_genome_AF=0.8166;gnomad41_genome_AF_raw=0.8160;CLNSIG=.

My question is: how would I filter for this variant if I am looking for variants flagged as intergenic, but specifically variants that might affect ARHGEF16? I have a very large list of genes, and it would be difficult to correctly list all of the possible variations if I want to find intergenic variants near it.

ccbruels avatar Nov 18 '24 22:11 ccbruels

The question seems focused on the variant being intergenic. I am sorry but I still don't understand what is not working for you exactly. Can you provide a small test case, a VCF with full header, the gene list you are using, the command which is not working for you, and the output you expect?

I see the VCF has the Func.refGene=intergenic field, and also the gene name. I would expect it should be possible to combine that in the filtering expression as -i 'Func.refGene="intergenic" && Gene.refGene="PRDM16\x3bARHGEF16"' . I am wondering if the \x3b part is maybe the cause of the problems? It would help to provide more information, as indicated above

pd3 avatar Dec 01 '24 18:12 pd3