sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

simulate_genotype_call_dataset creates alleles as byte strings

Open hyanwong opened this issue 1 year ago • 1 comments

ds = sg.simulate_genotype_call_dataset(n_variant=2, n_sample=4, missing_pct=0, phased=True, seed=1)
for i, alleles in enumerate(ds['variant_allele'].values):
    print(f"Site {i}: {alleles}")

Alleles are e.g. [b'T' b'C'] (dtype |S1). I was expecting them to be dtype <U1. Is this intentional?

hyanwong avatar Jun 07 '24 08:06 hyanwong

I think this is a bug, which is probably related to:

  • https://github.com/sgkit-dev/vcf-zarr-spec/issues/14
  • https://github.com/sgkit-dev/sgkit/pull/1208

jeromekelleher avatar Jun 07 '24 08:06 jeromekelleher