zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

`FSStore` not handling `.zmetadata` correctly when `dimension_separator` is set to `/` at store level.

Open tasansal opened this issue 3 years ago • 0 comments

Zarr version

v2.12.0

Numcodecs version

0.10.2

Python Version

3.10

Operating System

Linux

Installation

using pip into virtual environment

Description

When creating an FSStore with dimension separator / the .zmetadata file also gets renamed to /zmetadata. I have tested this on Google Cloud Storage.

This causes inconsistent behavior in local vs. cloud stores. The local files get structured like this:

/root
|_ my_array
|_ .zgroup
|_ zmetadata

Not the missing . from .zmetadata.

And on the cloud storage, this is what happens:

/root
|_ my_array
|_ .zgroup
|_ /
  |_ zmetadata

Note that extra prefix / before .zmetadata and the missing ..

It appears that the local FSStore (or path handler) removes one of the slashes from path_to_root//zmetadata, whereas, on the cloud stores, it doesn't.

When we use zarr.open_consolidated, in some edge cases this fails. Such as: creating a Zarr on-prem and copying it to Google Cloud doesn't work because file structures are different.

The ideal behavior would be

/root
|_ my_array
|_ .zgroup
|_ .zmetadata

as usual, my_array has a dimension separator set to /, and Zarr should parse that properly.

One possible workaround is NOT to use the dimension_separator at the store level but use it when creating arrays. However, this is prone to error since we would have to specify it every time we create an array, or else there could be inconsistent . or / arrays within the store.

This works as expected, on both cloud and local:

import zarr


store = zarr.storage.FSStore("test.zarr", mode="w")
root = zarr.open_group(store)
ds = root.create_dataset("my_dataset", shape=(5, 5), chunks=(1, 1), dtype='float32', overwrite=True, dimension_separator='/')
zarr.consolidate_metadata(store)

Steps to reproduce

import zarr


store = zarr.storage.FSStore("test.zarr", mode="w", dimension_separator="/")
root = zarr.open_group(store)
root.create_dataset("my_dataset", shape=(5, 5), chunks=(1, 1), dtype='float32', overwrite=True)
zarr.consolidate_metadata(store)

Additional output

No response

tasansal avatar Sep 04 '22 15:09 tasansal