zarr-specs
zarr-specs copied to clipboard
Explicitly listing groups/arrays inside group metadata?
I'm curious if explicitly listing groups/arrays inside group metadata has been discussed before.
The downside is that this is redundant and potentially duplicate information (but in some sense so is all group metadata, see "implicit groups").
One advantage would be that this would eliminates the need to list the contents of a store and check for the existence of metadata objects, which can sometimes be rather expensive. It's kind of a half-way step to the consolidated metadata of Zarr v2.
It's also potentially useful for making group creation/modification atomic without race conditions like https://github.com/zarr-developers/zarr-python/issues/1435, because the canonical list of a group's contents is a single metadata file rather than a collection of sub-directories, which usually cannot be modied in an atomic fashion.
We have been discussing an extension that would provided links to parents and children so that from any point within the hierarchy, you could navigate up or down the tree without having to list the store. This is conceptually similar to how STAC works.
This approach would be more scalable than the current consolidated metadata approach with the obvious tradeoff that walking the tree one node at a time will often be more expensive than loading a single mapping of all the metadata.
cc @rabernat and @jedsundwall
Adding @kbgg who's working on this.