SCIPhI icon indicating copy to clipboard operation
SCIPhI copied to clipboard

Print PL field according to VCF specification

Open amkozlov opened this issue 4 years ago • 3 comments

Hi guys,

could you please consider aligning your VCF output to the VCF spec, which requires that PL field contains likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields (p. 5). In other words, for a biallelic site, PL field must contain three values that provide the likelihoods of REF/REF, REF/ALT, ALT/ALT.

I guess this might be related to #10, but here it's about fixing format violation, not about providing additional information (i.e., if you only have likelihood for one of the genotypes and others are assumed to be zero, you can print something like 0/1:2:9:.,60,.)

Thanks in advance!

amkozlov avatar Jul 09 '20 15:07 amkozlov

I don't think the PL field should be used at all. See https://github.com/cbg-ethz/SCIPhI/issues/22#issuecomment-594402727

winni2k avatar Jul 09 '20 19:07 winni2k

Oh I see, so then it should probably be changed to PP field, which is phred-scaled posterior genotype probability, according to the VCF v4.3 spec.

amkozlov avatar Jul 09 '20 20:07 amkozlov

Cool! Yes, PP looks like the right Tag.

winni2k avatar Jul 10 '20 11:07 winni2k