zarr-python
zarr-python copied to clipboard
feat: change array creation signature to allow sharding specification [do not merge]
The goal of this PR is to demonstrate one strategy to simplify the creation of arrays that use sharding. Don't consider merging this until we get a good look at some alternatives.
This PR alters the Array.create
routine, removing the chunk_shape
kwarg and instead beefing up the semantics of the chunks
kwarg. Specifically, the chunks
kwarg supports a new variant, ChunkSpec
, which aims to compactly specify both the chunk shape of an array as well as the (optional) sub-chunk shape.
ChunkSpec
is a typed dictionary with two keys: read_shape
and write_shape
. write_shape
specifies the shape of array chunks that can be written concurrently, i.e. the shape in array coordinates of the chunk files. read_shape
specifies the shape of array chunks that can be read concurrently, i.e. the shape in array coordinates of the sub-chunks contained in a chunk constructed with a sharding codec.
- passing
chunks = None
orchunks = {}
(we support the latter case because of how non-total typeddicts work) toArray.create
will automatically specify chunks using old v2 logic. - passing
chunks = {'write_shape': (20, 20)}
ORchunks = {'read_shape': (20, 20)}
toArray.create
will configure that array with no sharding and a chunk size of (20,20). - passing
chunks = {'write_shape': (20, 20), 'read_shape': (10,10)}
toArray.create
will configure that array with sharding, with a sub-chunk size of (10,10), and a chunk size of (20,20). This will also route all the of the user-specifiedcodecs
, if any, to the sharding codec.
Note that this PR does not change the signature of the array class itself. That would be a separate effort.
addresses #2170
TODO:
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in docs/tutorial.rst
- [ ] Changes documented in docs/release.rst
- [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)