Interoperability with Xarray Zarr
It would be really nice if we could interoperate with zarr coming from elsewhere, like xarray. Currently, if I run the following Python code to create the simplest zarr store:
import numpy as np
import pandas as pd
import xarray as xr
ds = xr.Dataset(
{"foo": (("x", "y"), np.random.rand(4, 5))},
coords={
"x": [10, 20, 30, 40],
"y": pd.date_range("2000-01-01", periods=5),
"z": ("x", list("abcd")),
},
)
# same result for format 2 or 3
ds.to_zarr("test.zarr", zarr_format=2, consolidated=False, mode="w")
xarray fails to ingest it (xr.open_dataset("test.zarr", engine="netcdf4")), but more importantly I can't get ncdump -h to even recognize it properly.
If I do ncdump -h test.zarr, I get ncdump: test.zarr: NetCDF: Unknown file format.
Digging further, I tried ncdump -h file://test.zarr#mode=zarr,file, which at least seems to trigger the Zarr support, but I still get ncdump: file://test.zarr#mode=zarr,file: NetCDF: NCZarr error.
My first ask: can we please, please try to get some auto-detection working for Zarr here? Going to a full URI using custom fragments for a local file/directory seems pretty rough from a UX perspective.
Secondly, is there anything we can loosen so this works?
agree. I was working on auto detect but got side tracked on zarr v3. I am surprised the xarray did not work I thought I had at least one test for it. Can you try ncdump -k to see if any useful info shows. Also print out the file. Metadata and post here. I may be able to spot a problem
Ugh, I meant to include the generated zarr. I'll note that in the issue linked above, it's suggested that this is something that broke in 4.9.3.
that file is empty.
😱 And today I learned that you have to opt-in with zip to have it zip everything in a directory. sigh
I updated the link in my comment above to have a corrected zip.
Ok, the reason it fails is because the file uses the U dtype which PY_UNICODE, which apparently deprecated as of python 3.12. And as of python 3.15, utf-8 is the default. The Zarr V2 spec has not AFAIK been changed to correspond. Frankly I am not sure what kind of hack I should use to support PY_UNICODE. Suggestions welcome.
This issue partly addressed by PR https://github.com/Unidata/netcdf-c/pull/3218 As noted the real problem here is the use of dtype <U1.