sgkit
sgkit copied to clipboard
simulate_genotype_call_dataset creates alleles as byte strings
ds = sg.simulate_genotype_call_dataset(n_variant=2, n_sample=4, missing_pct=0, phased=True, seed=1)
for i, alleles in enumerate(ds['variant_allele'].values):
print(f"Site {i}: {alleles}")
Alleles are e.g. [b'T' b'C'] (dtype |S1). I was expecting them to be dtype <U1. Is this intentional?
I think this is a bug, which is probably related to:
- https://github.com/sgkit-dev/vcf-zarr-spec/issues/14
- https://github.com/sgkit-dev/sgkit/pull/1208