Rechunk method for uncompressed arrays
@rsignell this is inspired by your blog post (still a WIP for now)
The idea is that you can simply do
vds = open_virtual_dataset('uncompressed_netcdf3.nc')
subchunked_vds = vds.chunk(time=1)
- [x] Closes #86
- [x] Tests added
- [x] Tests passing
- [x] Full type hint coverage
- [ ] Changes are documented in
docs/releases.rst - [ ] New functions/methods are listed in
api.rst - [ ] New functionality has documentation
This now works in the sense that the .rechunk method on the ManifestArray class passes dedicated tests (and we can rechunk in however many dimensions we want!), but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.
Note to self: we should add more validation to the ZArray class to check that the chunks attribute is a tuple of positive integers, and move the zarray.replace call to the start of the method to catch invalid input early.
we can rechunk in however many dimensions we want!
I love that aspect!
but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.
https://github.com/pydata/xarray/pull/9286 has now progressed far enough that this PR works for me at least (when using that xarray branch)! Passing all tests locally 🟢