Handling of Phasing Information (PS Tags) with bcftools liftover
Hi,
I'm working on comparative genomics across divergent species. For each species/population, I map the sequencing data to their own reference genome, do variant calling, and apply statistical phasing to generate phased VCFs with genotype phase (|) and phase set (PS) annotations.
After this, I want to convert the population-specific VCFs to a common reference genome using bcftools liftover. However, I could not find documentation specifying how bcftools liftover handles phasing information during coordinate conversion — specifically, whether PS tags are retained and remain valid after the liftover process, and if phasing structure is preserved when variants move relative to each other.
Does bcftools liftover preserve phasing information correctly during coordinate conversion? Are there any caveats I should be aware of regarding phasing sets after liftover?
Thank you!
BCFtools/liftover will not change the phase status of the alleles. If two nearby variants are lifted over near of each other, the phase should still be consistent. That said, when they are lifted over far away you should not expect anything good from the phase. Because of this I would advise against lifting over imputed genotypes. It would be better to liftover pre-imputed genotypes and then run imputation again
If you are curious you can check how genotypes updates are handled looking at the update_genotypes() function:
- if alleles are not swapped during the liftover process then
update_genotypes()is not called - if two alleles are swapped, the corresponding alleles are remapped to each other
- if a new reference allele is added, then alleles are shifted by one