cyvcf2 icon indicating copy to clipboard operation
cyvcf2 copied to clipboard

Call genotype bases at FORMAT tags other than "GT"

Open everestial opened this issue 8 years ago • 1 comments

I have a VCF where "GT" represent unphased genotype. But, the phased genotype state are represented by another FORMAT field, like "PG". I can obtain the alleles as numeric. But, is it possible to convert it to actual nucelotide bases.

    for variant in vcf_file:
        print()
        print(variant.gt_bases)
        print(variant.format('GT'))
        print(variant.genotypes)
        print(variant.format('PG')

Output:

['./.' 'T/T' './.' './.' 'T/T' 'T/C' 'T/C']
['' '\x02\x02' '' '' '\x02\x02' '\x02\x04' '\x02\x04']
[[-1, -1, False], [0, 0, False], [-1, -1, False], [-1, -1, False], [0, 0, False], [0, 1, False], [0, 1, False]]
['./.' '0/0' './.' './.' '0/0' '1|0' '0/1']

But, I would like to convert my PG genotypes to actual bases. I was wondering if something like this would work ? but it doesn’t.

print(variant.format('PG').gt_bases)
print(variant.gt_bases(variant.format('PG')))
print()

Also, tried to look into the file cyvcf2.pyx under the property gt_bases and gt_types but can't figure out.

everestial avatar Apr 15 '18 21:04 everestial

@brentp Any update on this issue ?

Thanks,

everestial avatar Apr 29 '18 13:04 everestial