zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[v3] revisit runtime config

Open jhamman opened this issue 1 year ago • 1 comments

This issue tracks a evaluation of the v3 runtime config

Context

The v3 branch runtime config currently looks like this:

https://github.com/zarr-developers/zarr-python/blob/76c345071db950b2362f7588ad20da4a1af03b85/src/zarr/v3/config.py#L34-L38

This is then attached to Array/Group classes

https://github.com/zarr-developers/zarr-python/blob/76c345071db950b2362f7588ad20da4a1af03b85/src/zarr/v3/array.py#L51-L55

A few things are missing here:

  1. User experience
    • as a user, I may want to set config settings and forget about them (e.g. order, concurrency)
  2. Portability
    • I don't know for sure but I really doubt that putting the AsyncIO loop on the Array class is going to work when it comes to serialization

Improvements

So looking for some ideas for how to manage this better. Two ideas:

  1. Xarray style set-options: https://docs.xarray.dev/en/stable/generated/xarray.set_options.html
    • Pros: allows for validation and is typed
    • Cons: a bit bespoke, doesn't support environment variables or a config file option
  2. Dask style config - https://donfig.readthedocs.io/en/latest/
    • Pros: very flexible framework, support for environment variables and config files, nested namespaces, etc.
    • Cons: extra dependency (though we could vendor it), no typing or validation

what do we expect to go in the runtime config?

  • Order
  • Concurrency
  • logging settings
  • what else?

jhamman avatar Apr 05 '24 19:04 jhamman

I spoke with @maxrjones today about this. Our thought for now was to try using donfig and see how it goes. We can continue to evaluate the dependency vs vendoring and typing/validation as needed.

cc @djhoese

jhamman avatar Apr 19 '24 02:04 jhamman

Additional config options:

  • Specify alternate implementations for CodecPipeline (e.g. for a rust-based codec pipeline)
  • Specify alternate implementations for codecs (e.g. for GPU-based batch-aware codecs)
  • Batch size in the HybridCodecPipeline

normanrz avatar May 08 '24 18:05 normanrz

thanks @normanrz! Joe mentioned you asked about this today. I'm working on getting a minimal PR opened now and should have that submitted within the next couple hours.

maxrjones avatar May 08 '24 18:05 maxrjones

@normanrz - #1855 is now in the v3 branch. Should clear the way to add additional config options as needed.

jhamman avatar May 10 '24 21:05 jhamman

Thanks @maxrjones for getting this moving!

jhamman avatar May 10 '24 21:05 jhamman

I think this is a great way of dealing with configurations. Thanks!

normanrz avatar May 14 '24 12:05 normanrz