cyvcf2
cyvcf2 copied to clipboard
Call genotype bases at FORMAT tags other than "GT"
I have a VCF where "GT" represent unphased genotype. But, the phased genotype state are represented by another FORMAT field, like "PG". I can obtain the alleles as numeric. But, is it possible to convert it to actual nucelotide bases.
for variant in vcf_file:
print()
print(variant.gt_bases)
print(variant.format('GT'))
print(variant.genotypes)
print(variant.format('PG')
Output:
['./.' 'T/T' './.' './.' 'T/T' 'T/C' 'T/C']
['' '\x02\x02' '' '' '\x02\x02' '\x02\x04' '\x02\x04']
[[-1, -1, False], [0, 0, False], [-1, -1, False], [-1, -1, False], [0, 0, False], [0, 1, False], [0, 1, False]]
['./.' '0/0' './.' './.' '0/0' '1|0' '0/1']
But, I would like to convert my PG genotypes to actual bases. I was wondering if something like this would work ? but it doesn’t.
print(variant.format('PG').gt_bases)
print(variant.gt_bases(variant.format('PG')))
print()
Also, tried to look into the file cyvcf2.pyx under the property gt_bases and gt_types but can't figure out.
@brentp Any update on this issue ?
Thanks,