hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

Specify that haploid genotypes are unphased in BCF

Open VorontsovIE opened this issue 4 years ago • 2 comments

I propose to specify it explicitly in the spec that in case of haploid genotype, BCF record should have unphased GT field. It's not a big deal, but current spec leaves some space for uncertainty without necessity.

VorontsovIE avatar Jul 19 '20 15:07 VorontsovIE

VCF phasing can be made explicit by using a | or / suffix and this functionality is needed to be able to phase a haploid variant with other non-haploid variants.

It's perfectly ok for one site to be haploid but to be phased with other sites that have copy number greater than 1. Such a scenario is quite common in cancer genomics. As VCF neither requires the variants caller to be exhaustive, nor does it make any ploidy assumptions, a haploid call in region of a genome with CN=1 is perfectly acceptible, even when the organism is not typically haploid.

d-cameron avatar Jul 20 '20 00:07 d-cameron

I can see two scenarios for phased haploid calls:

  1. copy number loss causes a non-haploid region to become haploid. The caller can chose whether or not the overlapping deletion allele is included in the call.
  2. Region is, and has always been haploid, but other regions have had copy number gains that need to be phased (e.g. due to segemental duplication, double minute formation, partial chromosomal duplication). There is no overlapping deletion allele in this case - it's a genuinely haploid site that has never been anything other than haploid.

d-cameron avatar Jul 20 '20 00:07 d-cameron