hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

Canonical way to store BGEN phased probability data in a VCF

Open aoblebea opened this issue 4 years ago • 0 comments

Hello,

I am trying to store phased probability data ingested from a BGEN file in a VCF. The BGEN format stores these probabilities per haplotype per allele. From what I can tell, the obvious VCF candidate fields (GL, PL, etc.) are instead in "canonical order", which the BGEN format calls "colex order" and uses for unphased probability data. As these deal with unordered combinations of alleles, they cannot record the phased probability data without loss of the phase information. Could the PS/PQ fields somehow be used to retain this information? I am mostly interested in the diploid case.

aoblebea avatar Oct 18 '21 23:10 aoblebea