bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools norm shifts symbolic <DEL> to position 1 without warning if the END tag is missing from VCF

Open davmlaw opened this issue 1 year ago • 2 comments

Found while testing the changes in #1919

Leaving off the "END" tag causes <DEL> symbolic alts to shift to position 1 with no warning (DUP are fine).

Sample output line:

NC_000003.11	1	.	N	<DEL>	.	PASS	SVTYPE=DEL;SVLEN=-2666;BCFTOOLS_OLD_VARIANT=NC_000003.11|128204048|G|<DEL>

Command:

bcftools norm --fasta-ref=/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz --old-rec-tag=BCFTOOLS_OLD_VARIANT del_normalize_test_no_end.GRCh37.vcf

File: del_normalize_test_no_end.GRCh37.vcf.txt

It is not clear to me from the VCF spec whether the END tag is required for symbolic variants.

an explicit END INFO field provides variant span information that is otherwise unknown. ... This field is used to compute BCF’s rlen field

Ideally, you should be able to use SVLEN to get the rlen, but if the END tag is required, it would be better to:

  • Throw an error
  • Give a warning about missing END tag on symbolic alt, and skip the record

If it is an error or warning, it would be nice for it to be noted in bcftools view as well. Thanks!

davmlaw avatar Jul 01 '24 06:07 davmlaw

FYI the END info has been deprecated in VCF 4.5

davmlaw avatar Jul 24 '24 09:07 davmlaw

I think bcftools does the right thing here using rlen and instead this is a htslib issue

davmlaw avatar Aug 12 '24 01:08 davmlaw