VirtualiZarr Rechunk method for uncompressed arrays

@rsignell this is inspired by your blog post (still a WIP for now)

The idea is that you can simply do

vds = open_virtual_dataset('uncompressed_netcdf3.nc')
subchunked_vds = vds.chunk(time=1)

[x] Closes #86
[x] Tests added
[x] Tests passing
[x] Full type hint coverage
[ ] Changes are documented in docs/releases.rst
[ ] New functions/methods are listed in api.rst
[ ] New functionality has documentation

Jul 22 '24 16:07 TomNicholas

This now works in the sense that the .rechunk method on the ManifestArray class passes dedicated tests (and we can rechunk in however many dimensions we want!), but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.

Jul 23 '24 08:07 TomNicholas

Note to self: we should add more validation to the ZArray class to check that the chunks attribute is a tuple of positive integers, and move the zarray.replace call to the start of the method to catch invalid input early.

Jul 23 '24 08:07 TomNicholas

we can rechunk in however many dimensions we want!

I love that aspect!

Jul 23 '24 20:07 rsignell

but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.

https://github.com/pydata/xarray/pull/9286 has now progressed far enough that this PR works for me at least (when using that xarray branch)! Passing all tests locally 🟢

Jul 30 '24 00:07 TomNicholas