zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

consider removing default compressor / filters / serializer from config

Open d-v-b opened this issue 7 months ago • 1 comments

Our config right now contains this logic for defining a default encoding scheme for a given data type:

https://github.com/zarr-developers/zarr-python/blob/af55fcfaefa42b5ef556b1b5be33dcdd06a7fd0b/src/zarr/core/config.py#L85-L107

This approach is problematic because it requires dividing our data types into separate categories which are not very well defined -- is a fixed-length utf32 data type a "string" or "numeric" type?

Given the changes coming in #2874, I propose the following alteration to our approach here:

  • Pull this stuff out of the config entirely.

  • Confine all this logic to a single function for automatically picking a chunk encoding based on a data type + a requested chunk encoding. This function should also check for incompatibility between a data type and a requested chunk encoding. For example, if someone requests a variable-length string data type but does not specify vlen-utf8 as a serializer, then they should get a clear, early error.

These would be breaking changes, but our current approach is, IMO, unworkable.

d-v-b avatar May 30 '25 08:05 d-v-b

note: any changes to the config API, e.g. those in #2874, should be preceded by deprecations wherever possible. This is supported by donfig: https://donfig.readthedocs.io/en/latest/configuration.html#deprecations.

xref: https://github.com/zarr-developers/zarr-python/pull/2874#discussion_r2132945621

d-v-b avatar Jun 07 '25 12:06 d-v-b

closed via #3228

d-v-b avatar Jul 13 '25 13:07 d-v-b