numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

Codec creation: getting the shape of a chunk?

Open csubich opened this issue 2 years ago • 2 comments

Is there any way for a codec (as applied to encoding/decoding within zarr) to be reliably provided the shape of the chunk it is decoding?

My use-case here is to write a codecs that apply dynamic scaling and quantization (based on planes of a 3+-dimension array, normalizing by local min/max within a chunk) and/or two-dimensional linear prediction (extending numcodecs.Delta, essentially).

When calling Codec.encode() this is not a problem; the buffer supplied is a full array-like unless an earlier filter stage has done something. However, on decoding the codec is only reliably supplied a byte-stream without shape information. The out parameter to decode() seems to be inconsistently supplied.

Obviously, Zarr knows what shape of chunk it is seeking to fill. Without that information, I'll have to encode the array shape information in the output datastream. That's unnecessary redundancy, and more importantly it is aesthetically displeasing.

csubich avatar Oct 13 '23 14:10 csubich

I previously have pushed for the concept of "context", which would be passed by zarr to both the codec's encode/decode methods and to the storage layer, specifying where in the array we are, the shape, key, ... and other useful pieces of information that are available at call time. Currently, the context (zarr.context.Context) only has meta_array: NDArrayLike, I see no reason not to populate it further.

martindurant avatar Oct 18 '23 15:10 martindurant

Context of the chunk within the larger super-array would also be interesting, since it could allow some special-case encoders that apply data transforms along the way.

csubich avatar Oct 18 '23 15:10 csubich