zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Why is path required for opening Zarr v3 groups?

Open shoyer opened this issue 2 years ago • 5 comments

With Zarr v2, I can open a group by passing either a valid Zarr store or with a path specified as a string, i.e., like zarr.open_group(store_or_path). As I understand it, paths get normalized into store objects, e.g., to a local filesystem or via fsspec.

With Zarr v3, as currently implemented, the path argument is apparently now required, per https://github.com/pydata/xarray/pull/6475. This feels like a small step backwards in terms of usability. I'm wondering if I'm missing some broader context here? Maybe some examples of how users would canonically create a group, add an array and then access the data in the new v3 API would be helpful.

shoyer avatar May 25 '22 18:05 shoyer

The short version is that previously all (meta)data needed to open a group was at one location but now there is root metadata and the metadata tree is separated from the data tree. The result is that it's much more like opening a Zip file: "open foo.zip and then load /some/file"

cc: @grlee77 for a more complete backstory.

@jbms has proposed notation of some form to simplify the usage. Taking the zip example, foo.zip#/some/file or foo.zip//some/file.

joshmoore avatar Jun 22 '22 15:06 joshmoore

It seems like the v3 spec has support for a "root" group or array: https://zarr-specs.readthedocs.io/en/latest/core/v3.0.html#storage

Could we simply make that the default for zarr.open_group? I.e., group='/'?

I would be much happier with this sort of default in Zarr rather than in Xarray. We already have one too many domain specific extensions to Zarr in Xarray!

shoyer avatar Jun 22 '22 16:06 shoyer

Making the root group the default seems like a reasonable choice, but I think it would be nice to more generally be able to specify a zarr group or array to open with just a single string.

On that broader point we could continue the discussion here: https://github.com/zarr-developers/zarr-specs/issues/132

jbms avatar Jun 22 '22 18:06 jbms

Could we simply make that the default for zarr.open_group? I.e., group='/'?

I did overlook this following statement about root.group.json or root.array.json and agree that we should add that to the v3 support here. I will try to add that in the coming week or so.

If the root node is a group, the metadata key is “meta/root.group.json”. If the root node is an array, the metadata key is “meta/root.array.json”, and the data keys are formed by concatenating “data/root/” and the chunk identifier.

The zarrita and xtensor-zarr implementations implement a Hierarchy class that represents the root and can be opened by just giving the directory name. This hierarchy is just a collection of nodes, where each node can be an array or group. So, the hierarchy object can be opened without specifying any particular path as it represents the root of the zarr store.

The concept of a hierarchy is discussed in the spec, although specific definition of methods present on a hierarchy are not given. I am aware of the following two Hierarchy class implementations: (zarrita Hierarchy definition and xtensor-zarr Hierarchy definition). There is a lot of similarity to the Group class itself in having methods to create arrays or groups, and it was not clear to me if the standard requires presence of a Hierarchy class independent of Group.

grlee77 avatar Jun 29 '22 14:06 grlee77

quick update: I have a PR nearly done for root array/group support in v3. I just need to take a look again later today at one failing test case.

grlee77 avatar Jul 18 '22 17:07 grlee77