Results 498 comments of Tom White

> In terms of parallelism by variants, this is definitely a weakness in the tskit API at the moment, we want to add some way of doing this well. BTW...

> If I use `window_by_variant` then I'd expect the reported base length to be the distance between the first and last variant of each window. Agreed. This should be fairly...

> Just a thought, would it be worth adding `window_base_start` and `window_base_stop` instead? Yes, that's better. Perhaps `window_start_position` and `window_stop_position` to echo the `variant_position` variable? This will take us into...

> Could we (in principle) do an encoding ourselves and expose the `alleles` dataset variable as a numpy object array (or similar)? Yes, I think it would be worth trying...

Somewhat related (but maybe best discussed in a separate issue): it's currently quite awkward to get contig names (rather than indexes) when looking at summaries of the data, or for...

> we'll make life miserable/impossible for Windows users if don't gate these libraries Plink and bgen already work on Windows (and have very small wheels), so it's only VCF that's...

> There will be headaches, I'm sure of it. 😄 > I'll start the ball rolling there after we get the next release out? So the next release should not...

Thanks for the suggestion @elswob! This looks like it might be a useful guide to building such an image: https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html

> It would be worth investigating how we could use Zarr's [variable-length strings](https://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays) In the case of writing Zarr sequentially from VCF, we already do this (it can be enforced...

I'm not sure how to fix this, so I opened https://github.com/pydata/xarray/discussions/5769 with a minimal example.