cyvcf2 icon indicating copy to clipboard operation
cyvcf2 copied to clipboard

Crash with single GT allele value - UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)

Open davmlaw opened this issue 1 year ago • 2 comments

If a VCF has a single GT value, cyvcf2 crashes out with:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)

VCF spec says

Haploid calls, e.g. on Y, male non-pseudoautosomal X, or mitochondrion, are indicated by having only one allele value

Example file:

single_gt.vcf.gz

Test code (using vcf above)

In [1]: from cyvcf2 import VCF

In [2]: reader = VCF("./single_gt.vcf.gz")

In [3]: v = next(iter(reader))

In [4]: v.format("GT")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In [4], line 1
----> 1 v.format("GT")

File /usr/local/lib/python3.10/dist-packages/cyvcf2/cyvcf2.pyx:1353, in cyvcf2.cyvcf2.Variant.format()

UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)

davmlaw avatar Dec 22 '23 07:12 davmlaw

Hi Dave, Don't use format for GT. use variant.genotype for an object or variant.genotypes for an array.

brentp avatar Dec 22 '23 14:12 brentp

Thanks - will do that as a workaround.

I usually use the specific methods (which work fine), in this case I wanted to store all format fields in JSON

davmlaw avatar Jan 02 '24 01:01 davmlaw