bcftools
bcftools copied to clipboard
bcftools view losing results when widening the range.
Using the latest release version of bcftools-1.12 I've been hitting an issue where it seems to lose entries when widening a query range
/data/bcftools-1.12/bcftools view /mnt/results/pipeline/sample/sample.g.vcf.gz -r chr17:150000-170000 -O v
##contig=<ID=HLA-DRB1*15:03:01:01,length=11567,assembly=Homo_sapiens_assembly38.index>
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569,assembly=Homo_sapiens_assembly38.index>
##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=Homo_sapiens_assembly38.index>
##source=HaplotypeCaller
##bcftools_viewVersion=1.12+htslib-
##bcftools_viewCommand=view -r chr17:150000-170000 -O v /mnt/results/pipeline/sample/sample.g.vcf.gz; Date=Wed Jun 16 12:43:47 2021
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 155883 . C <NON_REF> . . END=156518 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161291 . C <NON_REF> . . END=161661 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161912 . T <NON_REF> . . END=162399 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 163779 . A <NON_REF> . . END=164497 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
/data/NGS_Software/bcftools-1.12/bcftools view /mnt/results/pipeline/sample/sample.g.vcf.gz -r chr17:140000-170000 -O v
##contig=<ID=HLA-DRB1*15:03:01:01,length=11567,assembly=Homo_sapiens_assembly38.index>
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569,assembly=Homo_sapiens_assembly38.index>
##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=Homo_sapiens_assembly38.index>
##source=HaplotypeCaller
##bcftools_viewVersion=1.12+htslib-
##bcftools_viewCommand=view -r chr17:140000-170000 -O v /mnt/results/pipeline/sample/sample.g.vcf.gz; Date=Wed Jun 16 12:44:23 2021
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 163779 . A <NON_REF> . . END=164497 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
On another machine with bcftools 1.9 that has the same folder mounted I get what I believe to be the correct result.
/Software/NGS_Software/bcftools-1.9/bcftools/bcftools view /mnt/results/pipeline/sample/sample.g.vcf.gz -r chr17:0-170000 -O v
##bcftools_viewCommand=view -r chr17:0-170000 -O v /mnt/results/pipeline/sample/sample.g.vcf.gz; Date=Wed Jun 16 13:45:44 2021
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 155883 . C <NON_REF> . . END=156518 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161291 . C <NON_REF> . . END=161661 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161912 . T <NON_REF> . . END=162399 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 163779 . A <NON_REF> . . END=164497 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
I have re-indexed the gvcf file just to be sure it wasn't an indexing error.
gatk-4.1.4.0/gatk IndexFeatureFile -F sample.g.vcf.gz
It hasn't changed the behaviour.
extracting the relevant lines with zcat yields this:
zcat /mnt/results/pipeline/sample/sample.g.vcf.gz | head -n 7494428 | tail -n 20
chr16 90175398 . G <NON_REF> . . END=90175410 GT:DP:GQ:MIN_DP:PL 0/0:4:6:2:0,6,82
chr16 90175411 . G <NON_REF> . . END=90175423 GT:DP:GQ:MIN_DP:PL 0/0:2:3:1:0,3,39
chr16 90175424 . C <NON_REF> . . END=90175427 GT:DP:GQ:MIN_DP:PL 0/0:2:6:2:0,6,88
chr16 90175428 . C <NON_REF> . . END=90175514 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,37
chr16 90175515 . A <NON_REF> . . END=90175615 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr16 90175616 . A <NON_REF> . . END=90175626 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,37
chr16 90177585 . T <NON_REF> . . END=90177964 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr16 90185840 . A <NON_REF> . . END=90186294 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr16 90222129 . A <NON_REF> . . END=90222626 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 155883 . C <NON_REF> . . END=156518 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161291 . C <NON_REF> . . END=161661 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 161912 . T <NON_REF> . . END=162399 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 163779 . A <NON_REF> . . END=164497 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 172191 . A <NON_REF> . . END=172347 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
chr17 172348 . G <NON_REF> . . END=172356 GT:DP:GQ:MIN_DP:PL 0/0:1:3:1:0,3,38
chr17 172357 . T <NON_REF> . . END=172373 GT:DP:GQ:MIN_DP:PL 0/0:2:6:2:0,6,76
chr17 172374 . G <NON_REF> . . END=172375 GT:DP:GQ:MIN_DP:PL 0/0:3:9:3:0,9,128
chr17 172376 . A <NON_REF> . . END=172386 GT:DP:GQ:MIN_DP:PL 0/0:4:12:4:0,12,154
chr17 172387 . A <NON_REF> . . END=172401 GT:DP:GQ:MIN_DP:PL 0/0:5:15:5:0,15,195
chr17 172402 . C <NON_REF> . . END=172411 GT:DP:GQ:MIN_DP:PL 0/0:6:18:6:0,18,198
this behaviour also appears to be present in bcftools 1.10.2
Can you please index with bcftools index
instead and try again? If the problem persists, could you please provide a small test case, including your index? In my tests I was not able to reproduce the problem. I am assuming your bcftools and htslib are from the same release.