htslib
htslib copied to clipboard
`bcftools view` with region filter drops variants at POS=0 (telomeres)
Given a bgzipped VCF that contains a telomere indicated by position 0 (as per the VCF spec):
$ zcat example.vcf.gz | grep -v '#'
22 0 ID1 C G . PASS . GT 1/0
22 1 ID2 C G . PASS . GT 1/0
(Disclaimer: I know next to nothing about telomeres, and have really only ended up here due to invalid, non-int positions getting converted to 0, as discussed in https://github.com/samtools/htslib/issues/1570. My example undoubtedly makes no realistic sense!)
A simple bcftools view
works as expected:
$ bcftools view -H example.vcf.gz
22 0 ID1 C G . PASS . GT 1/0
22 1 ID2 C G . PASS . GT 1/0
Applying a region filter however unexpectidly drops the telomere (albeit with a warning referring to a flag that I don't think exists in this context?):
$ bcftools view -H --regions 22 example.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
22 1 ID2 C G . PASS . GT 1/0
$ echo $?
0
It seems like the telomere should be included with this filter, given its chr; and I haven't encountered anywhere in the docs that indicates the observed behaviour. However, there certainly could be some biological reason why that doesn't make sense!
The way it gets dropped can cause an interesting issue when considred with https://github.com/samtools/htslib/issues/1570, as it means variants where (for whatever reason...) POS
gets corruped, get converted to 0
and then dropped when a pipe goes on to have a region filter!
What is the version of bcftools you are using? I am unable to reproduce with the latest
I'm seeing this with a build of the latest version (BCFtools fe98a6b54536246ce650ad5fde0b75e99bb22fa0
; htslib 70df0479771fa28e927c01de4c5ca95f1e63ebba
), on Ubuntu 22.04 with:
cd htslib
autoreconf --install
./configure
make
cd ../bcftools
autoreconf
./configure
make
Also the same with 1.17
.
$ bcftools view -H --regions 22 example.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
22 1 ID2 C G . PASS . GT 1/0
$ bcftools --version
bcftools 1.17-4-gfe98a6b
Using htslib 1.17-1-g70df047
Copyright (C) 2023 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Just to confirm, I was able to reproduce the problem now
$ echo -e "##fileformat=VCFv4.2\n##contig=<ID=22>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n22\t0\tID0\tC\tG\t.\tPASS\t.\n22\t1\tID1\tC\tG\t.\tPASS\t." | bcftools view -o rmme.vcf.gz
$ bcftools index -f rmme.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
$ bcftools --version
bcftools 1.17-2-gdb8d6573
Using htslib 1.17-1-g70df047
The BCF format works
$ echo -e "##fileformat=VCFv4.2\n##contig=<ID=22>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n22\t0\tID0\tC\tG\t.\tPASS\t.\n22\t1\tID1\tC\tG\t.\tPASS\t." | bcftools view -o rmme.bcf
$ bcftools index -f rmme.bcf
I don't think this has been completed
I don't think this has been completed
Apologies! I spent a few minutes yesterday follow up my old PRs/issues - but I was clearly too trigger happy here when I saw that commits had been added referencing this ticket.