Refactor CodecPipeline for flexibility
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Currently, the CodecPipeline interface works by passing around Iterable[tuple[...]] for various types of tuples. For example decode: https://github.com/zarr-developers/zarr-python/blob/5ff3fbe5fe1488310301e9d2ae56a9880d1ddfb2/src/zarr/abc/codec.py#L115
- decode:
Iterable[tuple[CodecOutput | None, ArraySpec]] - encode:
Iterable[tuple[CodecInput | None, ArraySpec]] - read:
Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]] - write:
Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
At the moment, we have no way to evolve the interface in a backwards compatible way. https://github.com/zarr-developers/zarr-python/discussions/2845 noted an accidental API break.
One option for gracefully evolving the spec here, which I might need for https://github.com/zarr-developers/zarr-python/issues/2904, is to replace the tuples with dataclasses. We can safely add new optional fields to the dataclass without breaking backwards compatibility.
We can define __len__ and __iter__ on the dataclasses and freeze their return values to the current API.
@dataclass(frozen=True, eq=True)
class DecodeChunksAndSpecs:
codec_output: CodecOutput | None
array_spec: ArraySpec
def __len__(self): return 2
def __iter__(self):
yield self.codec_output
yield self.array_spec
And potentially we would warn when accessing the fields through iteration or position, to encourage pipeline implementations to migrate to the new system.
Steps to reproduce
na
Additional output
No response
Another issue is that CodecPipeline.evolve_from_array_spec is currently never called. We need the ArrayMetadata and ArrayConfig in zarrs-python to properly support a broader range of Zarr V2 arrays and configurations. Also, it would be very helpful if the array store could be passed to the CodecPipeline constructor.
Right now it looks like zarrs-python is the only public user of CodecPipeline. IMHO you should just break this API for zarr-python 3.1.
cc: @ilan-gold