sgkit
sgkit copied to clipboard
read_plink returns bytes for variant_alleles not unicode
There's no good reason for returning bytes rather than utf8 unicode strings I think --- it can only lead to bugs in user code and inconsistencies in string handling (anyone remember Python 2???)
This is based on the "example" plink dataset in the test suite
sg_ds = sgkit.io.plink.read_plink(path=path)
print(sg_ds.variant_allele.values)
print(sg_ds.variant_allele)
Gives
[[b'A' b'G']
[b'T' b'C']]
<xarray.DataArray 'variant_allele' (variants: 2, alleles: 2)>
dask.array<astype, shape=(2, 2), dtype=|S1, chunksize=(2, 1), chunktype=numpy.ndarray>