how to abstract over v2 / v3 metadata differences
zarr array metadata in zarr v2 and v3 is different. v3 introduces concepts like chunk_grid, chunk_key_encoding, etc. The AsyncArray class has to handle both metadata types, and so somewhere in the codebase we need a zarr-version-agnostic representation of the two metadata types.
Right now, we define extra properties on the array v2 metadata class to make it resemble the array v3 metadata. See this example. Doing this introduces coupling between the two metadata models, which is a mistake IMO because they are separable entities. It is also unclear how we would refactor things if we were to ever introduce a new type of array metadata document.
I think a better solution to this problem is for routines that consume the two flavors of metadata documents (right now, AsyncArray) to be responsible for abstracting over the differences between v2 and v3 metadata. In concrete terms, I propose removing properties like ArrayV2Metadata.chunk_grid, and instead moving that logic to the AsyncArray class.
I propose removing properties like
ArrayV2Metadata.chunk_grid
Will you still be able to deduce chunk_grid from an ArrayV3Metadata class? VirtualiZarr has many routines that consume metadata documents, but only as ArrayV3Metadata (see #2986).
chunk_grid is a field defined in the v3 metadata, so it's correct for ArrayV3Metadata object to have that field. The problem is that chunk_grid is not a field defined in the v2 spec, so it shouldn't be an attribute of our model of the v2 metadata document.
if it's generally useful to define a chunk grid from an ArrayV2Metadata document, then we can define that operation as a function, which AsyncArray but also any other consumer could use.
a PR that could close this issue would do the following:
- deprecate all the ectopic v3 metadata fields exposed as properties on
ArrayV2Metadata - define functionally equivalent routines for
AsyncArray, perhaps as invocations of reusable functions.
And after a few releases we could remove those unnecessary fields from the metadata documents.
Sounds good - that shouldn't cause VirtualiZarr any issues.