rechunker icon indicating copy to clipboard operation
rechunker copied to clipboard

Feature request: 'auto' in `target_chunks`

Open shz9 opened this issue 5 years ago • 3 comments

Hi,

Thanks for the great package! I'm currently using it in one of my projects to rechunk large symmetric matrices along a given axis. However, I'm missing a feature that I liked in Dask: automatically determining the chunk size for a given dimension. For example, say that I have the following use case:

import zarr
import dask.array as da

d = da.ones((10000, 10000))
d = d.rechunk({0: 'auto', 1: None})
d.to_zarr('my_store.zarr')

Is it possible to add a feature to accomplish the same thing in rechunker? I'm currently doing it the hacky way:

import psutil
import zarr
import dask.array as da
from rechunker import rechunk

d = da.ones((10000, 10000))
d.to_zarr('my_store.zarr')
z = zarr.open('my_store.zarr')

...

rechunked = rechunk(z,
                    target_chunks=d.rechunk({0: 'auto', 1: None}).chunksize,
                    target_store=target_store,
                    temp_store=intermediate_store,
                    max_mem=psutil.virtual_memory().available / psutil.cpu_count())

rechunked.execute()

Hope this makes sense. Thanks!

shz9 avatar Dec 09 '20 08:12 shz9

This is a great idea, and I'd love to support it.

One question: how does dask determine the chunk size in the 'auto' dimensions? Do we feel that the same logic is appropriate in rechunker?

If so, we can probably just reuse dask's normalize_chunks function to implement this.

rabernat avatar Dec 09 '20 15:12 rabernat

I think the relevant function from Dask is here: https://github.com/dask/dask/blob/a988716cfeb3a9b1015d14a334368e70ae382553/dask/array/core.py#L2709

I believe it depends on a configurable limit on the size of the chunks config.get("array.chunk-size"), which can be easily incorporated into the rechunk function. Re-using normalize_chunks would also work fine, as it handles many other cases (e.g. -1 or None for some dimensions).

shz9 avatar Dec 11 '20 19:12 shz9

We would welcome a pull request if you feel comfortable trying to implement this yourself. 😊

rabernat avatar Dec 11 '20 19:12 rabernat