Tom White
Tom White
> In terms of parallelism by variants, this is definitely a weakness in the tskit API at the moment, we want to add some way of doing this well. BTW...
> If I use `window_by_variant` then I'd expect the reported base length to be the distance between the first and last variant of each window. Agreed. This should be fairly...
> Just a thought, would it be worth adding `window_base_start` and `window_base_stop` instead? Yes, that's better. Perhaps `window_start_position` and `window_stop_position` to echo the `variant_position` variable? This will take us into...
> Could we (in principle) do an encoding ourselves and expose the `alleles` dataset variable as a numpy object array (or similar)? Yes, I think it would be worth trying...
Somewhat related (but maybe best discussed in a separate issue): it's currently quite awkward to get contig names (rather than indexes) when looking at summaries of the data, or for...
> we'll make life miserable/impossible for Windows users if don't gate these libraries Plink and bgen already work on Windows (and have very small wheels), so it's only VCF that's...
> There will be headaches, I'm sure of it. 😄 > I'll start the ball rolling there after we get the next release out? So the next release should not...
Thanks for the suggestion @elswob! This looks like it might be a useful guide to building such an image: https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html
> It would be worth investigating how we could use Zarr's [variable-length strings](https://zarr.readthedocs.io/en/stable/tutorial.html#string-arrays) In the case of writing Zarr sequentially from VCF, we already do this (it can be enforced...
I'm not sure how to fix this, so I opened https://github.com/pydata/xarray/discussions/5769 with a minimal example.