bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools gtcheck ignores/skips sites with symbolic ALT alleles

Open shankarajays opened this issue 3 years ago • 0 comments

Ran into an issue with bcftools gtcheck (version 1.12) when comparing two VCFs, one of which has symbolic the ALT allele: <NON_REF>. The output shows that the number of sites compared is 0 even though matching positions exist in the two VCFs. To investigate, I created a dummy query VCF with one record

Query VCF has the following line:

1        752721    .       A       G,<NON_REF>     185.44  PASS    DP=47;MQ=204.19;FractionInformativeReads=0.979  GT:AD:AF:DP:F1R2:F2R1:GQ:PL:SPL:ICNT:GP:PRI:SB:MB       1/1:0,46,0:1.000,0.000:46:0,24,0:0,22,0:135:223,138,0,1965,138,1965:255,141,0:0,0:1.8544e+02,1.3544e+02,0.0000e+00,4.5000e+02,1.7021e+02,4.5000e+02:0.00,34.77,37.77,34.77,69.54,37.77:0,0,30,16:0,0,28,18

Genotypes VCF has the following line:

1       752721  rs3131972       A       G       .       PASS    AL=A/G;ST=+     GT:GC   1/1:0.8366

bcftools command:

bcftools gtcheck  -e 0 --no-HWE-prob -u GT,GT -g genotypes.vcf.gz query.vcf.gz

Output:

#DC     [2]Query Sample [3]Genotyped Sample     [4]Discordance  [5]-log P(HWE)  [6]Number of sites compared
DC      DUMMYSAMPLE   WG0341934-DNAA01_NA12878        0       0.000000e+00    0

Deleting just the symbolic allele in the ALT field from the query VCF record produces the desired output:

#DC     [2]Query Sample [3]Genotyped Sample     [4]Discordance  [5]-log P(HWE)  [6]Number of sites compared
DC      DUMMYSAMPLE   WG0341934-DNAA01_NA12878        0       0.000000e+00    1

shankarajays avatar May 03 '21 19:05 shankarajays