zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[v3] Design and implement storage transformer API

Open jhamman opened this issue 1 year ago • 0 comments

Summary

The V3 specification introduced a new Zarr abstraction -- the storage transformer. Storage transformers modifies a request to read or write data before passing that request to the following transformer or store. They can be sequenced to support a pipeline of operations as shown in the following diagram:

image

The initial implementation of the v3 spec in Zarr-Python did implement a first pass at storage transformers (#1096) but a fresh start is likely needed due to the evolution of the spec and internal design of Zarr-Python.

Will Zarr-Python 3 support any storage transformers? Initially, probably no -- but the intent is to support them eventually, even if only via plug in.

Initial storage transformers

Designing the storage transformer API without any target transformers is probably not a good idea. And in fact, there have been a few proposals spec extensions that would fit well here.

  • https://github.com/zarr-developers/zarr-specs/issues/82
  • https://github.com/zarr-developers/zarr-specs/issues/287

Are there others that have been discussed that this list misses? Is there v3 data in the wild that utilizes storage transformers?

Design

The basic flow of the storage transformers is fairly obvious:

  1. array metadata is decoded to produce a pipeline of transformers (0->N)
  2. when the array goes to fetch data, the keys are transformed by each element of the transformer pipeline then passed through to the store
  3. when the array goes to write data, the key and data are passed through to the transformer pipeline then through to the store

From here, we need to settle on an internal API (e.g. StorageTransformerPipeline) and a position for how new storage transformers will be developed and/or registered with Zarr-Python.

jhamman avatar Mar 20 '24 03:03 jhamman