bcftools
bcftools copied to clipboard
bcftools annotate does not overwrite existing entries in header with -h flag
Hi, this issue concerns bcftools annotate and its optional header -h input. I'll demonstrate with an example below (all run with v1.18).
Suppose you have the following test.vcf:
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000000>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
1 10000 . GA G 30 PASS DP=14 GT 0|0 1|0 1/1
1 20000 . T TA 30 PASS DP=11 GT 0|0 0|1 0/0
I want to update the DP field using the following annotations.tsv:
1 10000 15
1 20000 20
Then I run
bgzip annotations.tsv
tabix -s1 -b2 -e2 annotations.tsv.gz
to index the annotations. But suppose I also want to update the DP header line to have Number=A instead of Number=1. So I make annotations.hdr to be:
##INFO=<ID=DP,Number=A,Type=Integer,Description="Total Depth">
Finally, I run
bcftools annotate -a annotations.tsv.gz -h annotations.hdr -c CHROM,POS,INFO/DP -o "output.vcf.gz" test.vcf
which has the following output:
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000000>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_annotateVersion=1.18-10-gbe695393+htslib-1.17-48-gf4a3b99
##bcftools_annotateCommand=annotate -a annotations.tsv.gz -h annotations.hdr -c CHROM,POS,INFO/DP -o output.vcf.gz test.vcf; Date=Thu Sep 14 18:12:41 2023
##bcftools_viewVersion=1.18-10-gbe695393+htslib-1.17-48-gf4a3b99
##bcftools_viewCommand=view output.vcf.gz; Date=Thu Sep 14 18:17:49 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
1 10000 . GA G 30 PASS DP=15 GT 0|0 1|0 1/1
1 20000 . T TA 30 PASS DP=20 GT 0|0 0|1 0/0
So the DP fields update accordingly, but the header field is still the same as the old.
Normally, if the header field was missing, it would be added using the -h flag. Is it possible to add functionality so that the -h will also allow you to manually overwrite the existing header lines for the appropriate INFO field changed?
In general, I think bcftools reheader only allows you to update the sequence dictionary using a template. It would be nice if bcftools had a better way to manually update other header lines like in the above use case, especially if the two instances across these tools are related.