bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools annotate does not overwrite existing entries in header with -h flag

Open rickymagner opened this issue 2 years ago • 2 comments

Hi, this issue concerns bcftools annotate and its optional header -h input. I'll demonstrate with an example below (all run with v1.18).

Suppose you have the following test.vcf:

##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000000>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
1	10000	.	GA	G	30	PASS	DP=14	GT	0|0	1|0	1/1
1	20000	.	T	TA	30	PASS	DP=11	GT	0|0	0|1	0/0

I want to update the DP field using the following annotations.tsv:

1       10000   15
1       20000   20

Then I run

bgzip annotations.tsv
tabix -s1 -b2 -e2 annotations.tsv.gz

to index the annotations. But suppose I also want to update the DP header line to have Number=A instead of Number=1. So I make annotations.hdr to be:

##INFO=<ID=DP,Number=A,Type=Integer,Description="Total Depth">

Finally, I run

bcftools annotate -a annotations.tsv.gz -h annotations.hdr -c CHROM,POS,INFO/DP -o "output.vcf.gz" test.vcf

which has the following output:

##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000000>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_annotateVersion=1.18-10-gbe695393+htslib-1.17-48-gf4a3b99
##bcftools_annotateCommand=annotate -a annotations.tsv.gz -h annotations.hdr -c CHROM,POS,INFO/DP -o output.vcf.gz test.vcf; Date=Thu Sep 14 18:12:41 2023
##bcftools_viewVersion=1.18-10-gbe695393+htslib-1.17-48-gf4a3b99
##bcftools_viewCommand=view output.vcf.gz; Date=Thu Sep 14 18:17:49 2023
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
1	10000	.	GA	G	30	PASS	DP=15	GT	0|0	1|0	1/1
1	20000	.	T	TA	30	PASS	DP=20	GT	0|0	0|1	0/0

So the DP fields update accordingly, but the header field is still the same as the old.

Normally, if the header field was missing, it would be added using the -h flag. Is it possible to add functionality so that the -h will also allow you to manually overwrite the existing header lines for the appropriate INFO field changed?

In general, I think bcftools reheader only allows you to update the sequence dictionary using a template. It would be nice if bcftools had a better way to manually update other header lines like in the above use case, especially if the two instances across these tools are related.

rickymagner avatar Sep 14 '23 22:09 rickymagner