zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Use config to select implementation

Open brokkoli71 opened this issue 1 year ago • 5 comments

fixes #1878

Using the config (https://github.com/pytroll/donfig), the user can specify now the implementation of all codecs, the CodecPipeline, Buffer and NDBuffer. For each of these objects, the codec registry can deal with multiple different implementations and will use the one selected by the config.

Further changes:

  • All calls on classes Buffer and NDBuffer now get called on selected Implementation
  • Registry was expanded to register codec-pipelines, buffers and ndbuffers
  • Moved registry.py from zarr.codecs.registry to zarr.registry

brokkoli71 avatar Jun 20 '24 14:06 brokkoli71

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

normanrz avatar Jun 27 '24 13:06 normanrz

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

madsbk avatar Jun 28 '24 11:06 madsbk

@madsbk

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

Do you mean to have additionally to the BufferPrototype parameter in e.g. setitem another fallback BufferPrototype stored in the AsyncArray instance which might get set upon creation of the array? So the decision of which buffer to use would be like:

prototype in setitem → prototype in AsyncArray instance → config → numpy (with "→" being the fallback if previous was not set)

brokkoli71 avatar Jun 28 '24 20:06 brokkoli71

Yes exactly, but I see your point, it might be a bit too many fall backs :)

In any case, if we allow modification of default_buffer_prototype, I think we need an another constant like numpy_buffer_prototype that is always backed by a numpy array for internal use. E.g. when reading the shard index, we always want to use numpy: https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/codecs/sharding.py#L610

madsbk avatar Jun 29 '24 09:06 madsbk

good point! @madsbk

brokkoli71 avatar Jul 01 '24 11:07 brokkoli71