zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

how to abstract over v2 / v3 metadata differences

Open d-v-b opened this issue 7 months ago • 5 comments

zarr array metadata in zarr v2 and v3 is different. v3 introduces concepts like chunk_grid, chunk_key_encoding, etc. The AsyncArray class has to handle both metadata types, and so somewhere in the codebase we need a zarr-version-agnostic representation of the two metadata types.

Right now, we define extra properties on the array v2 metadata class to make it resemble the array v3 metadata. See this example. Doing this introduces coupling between the two metadata models, which is a mistake IMO because they are separable entities. It is also unclear how we would refactor things if we were to ever introduce a new type of array metadata document.

I think a better solution to this problem is for routines that consume the two flavors of metadata documents (right now, AsyncArray) to be responsible for abstracting over the differences between v2 and v3 metadata. In concrete terms, I propose removing properties like ArrayV2Metadata.chunk_grid, and instead moving that logic to the AsyncArray class.

d-v-b avatar May 22 '25 08:05 d-v-b

I propose removing properties like ArrayV2Metadata.chunk_grid

Will you still be able to deduce chunk_grid from an ArrayV3Metadata class? VirtualiZarr has many routines that consume metadata documents, but only as ArrayV3Metadata (see #2986).

TomNicholas avatar May 22 '25 08:05 TomNicholas

chunk_grid is a field defined in the v3 metadata, so it's correct for ArrayV3Metadata object to have that field. The problem is that chunk_grid is not a field defined in the v2 spec, so it shouldn't be an attribute of our model of the v2 metadata document.

d-v-b avatar May 22 '25 08:05 d-v-b

if it's generally useful to define a chunk grid from an ArrayV2Metadata document, then we can define that operation as a function, which AsyncArray but also any other consumer could use.

d-v-b avatar May 22 '25 08:05 d-v-b

a PR that could close this issue would do the following:

  • deprecate all the ectopic v3 metadata fields exposed as properties on ArrayV2Metadata
  • define functionally equivalent routines for AsyncArray, perhaps as invocations of reusable functions.

And after a few releases we could remove those unnecessary fields from the metadata documents.

d-v-b avatar May 22 '25 08:05 d-v-b

Sounds good - that shouldn't cause VirtualiZarr any issues.

TomNicholas avatar May 22 '25 08:05 TomNicholas