bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

norm -m + --atomize inconsistent representation of complex variants

Open mi3112 opened this issue 1 month ago • 0 comments

I am working with a WES tumor-only experiment. Variant calling was performed using three different tools:

  • Mpileup
  • Mutect
  • Freebayes

(in the pictures I kept the same order)

Image

All three callers detect the same complex variant, but each represents it differently in the original VCF.

To normalize the variants, I used the following command:

bcftools norm --atomize -f ref.fasta -o output.vcf input.vcf

Before normalized I have this representation: Freebayes chr17 7675081 . GGGGCAGC GGA

Mutect2

chr17	7675082	.	GGGC	G	
chr17	7675086	.	AGC	A	

Mpileup

chr17	7675081	.	GGGGCAG	GG	
chr17	7675088	.	C	A	

After normalization the result was Freebayes: chr17 7675083 . GGCAGC A

Mutect2:

chr17	7675082	.	GGGC	G	
chr17	7675086	.	AGC	A

Mpileup

chr17	7675081	.	GGGGCA	G	
chr17	7675088	.	C	A

Even after applying bcftools norm --atomize, the same biological variant is still represented differently across callers:

  • Different POS
  • Different decomposition boundaries
  • Different REF/ALT lengths

I was expecting --atomize to produce a canonical, consistent representation across callers (same coordinates and minimal atomic variants), but this did not happen.

Is this behavior expected? Does --atomize intentionally preserve caller-specific breakpoints or representations?

Is there a recommended way to obtain an identical representation for complex variants across different callers, so that intersections/overlaps between VCFs can be computed reliably?

mi3112 avatar Nov 24 '25 13:11 mi3112