cyvcf2
cyvcf2 copied to clipboard
Crash with single GT allele value - UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)
If a VCF has a single GT value, cyvcf2 crashes out with:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)
VCF spec says
Haploid calls, e.g. on Y, male non-pseudoautosomal X, or mitochondrion, are indicated by having only one allele value
Example file:
Test code (using vcf above)
In [1]: from cyvcf2 import VCF
In [2]: reader = VCF("./single_gt.vcf.gz")
In [3]: v = next(iter(reader))
In [4]: v.format("GT")
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
Cell In [4], line 1
----> 1 v.format("GT")
File /usr/local/lib/python3.10/dist-packages/cyvcf2/cyvcf2.pyx:1353, in cyvcf2.cyvcf2.Variant.format()
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)
Hi Dave,
Don't use format for GT. use variant.genotype for an object or variant.genotypes for an array.
Thanks - will do that as a workaround.
I usually use the specific methods (which work fine), in this case I wanted to store all format fields in JSON