Results 132 issues of Tom White

Suggested by @benjeffery here: https://github.com/pystatgen/sgkit/pull/1054#pullrequestreview-1360645992 Do the same for `filters` and `contig_lengths`.

IO

VCF to Zarr conversion is an embarrassingly parallel process that currently uses Dask Delayed to schedule tasks. It would be fairly easy to make it possible to run on any...

IO

Stopped working about 5 days ago: https://github.com/pystatgen/sgkit/actions/workflows/benchmark.yml

process + tools

I've been thinking about how we could run (parts of) sgkit on Cubed (#908). One thing that would help is using [`xarray.map_blocks`](https://docs.xarray.dev/en/stable/generated/xarray.map_blocks.html#xarray.map_blocks) (or [`xarray.apply_ufunc`](https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html)) instead of [`dask.array.map_blocks`](https://docs.dask.org/en/stable/generated/dask.array.map_blocks.html), since the Xarray...

dispatching

When Zarr variable chunking ([ZEP 3](https://zarr.dev/zeps/draft/ZEP0003.html)) is available we would be able to write partitions of a VCF directly into Zarr chunks that vary in size along the variants dimension....

IO

There is a new plan to bring Zarr up to date with the V3 spec over the next few months (see https://github.com/zarr-developers/zarr-python/discussions/1480). This issue is to run sgkit tests against...

IO
upstream

This was deprecated in #1054, and should be removed in a future release. Similarly for `filters` and `contig_lengths`.

IO
data representation

Currently `read_plink` maps the IID from the fam file to sgkit's `sample_id`. However, IID is only unique within the family ID (FID), so there is the potential for conflicts. (I...

IO

Follow on from discussion in #953. Get the round trip tests to work as far as possible. Part of #924

enhancement
IO