netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

Interoperability with Xarray Zarr

Open dopplershift opened this issue 3 months ago • 6 comments

It would be really nice if we could interoperate with zarr coming from elsewhere, like xarray. Currently, if I run the following Python code to create the simplest zarr store:

import numpy as np
import pandas as pd

import xarray as xr

ds = xr.Dataset(
    {"foo": (("x", "y"), np.random.rand(4, 5))},
    coords={
        "x": [10, 20, 30, 40],
        "y": pd.date_range("2000-01-01", periods=5),
        "z": ("x", list("abcd")),
    },
)
# same result for format 2 or 3
ds.to_zarr("test.zarr", zarr_format=2, consolidated=False, mode="w")

xarray fails to ingest it (xr.open_dataset("test.zarr", engine="netcdf4")), but more importantly I can't get ncdump -h to even recognize it properly.

If I do ncdump -h test.zarr, I get ncdump: test.zarr: NetCDF: Unknown file format.

Digging further, I tried ncdump -h file://test.zarr#mode=zarr,file, which at least seems to trigger the Zarr support, but I still get ncdump: file://test.zarr#mode=zarr,file: NetCDF: NCZarr error.

My first ask: can we please, please try to get some auto-detection working for Zarr here? Going to a full URI using custom fragments for a local file/directory seems pretty rough from a UX perspective.

Secondly, is there anything we can loosen so this works?

dopplershift avatar Nov 20 '25 22:11 dopplershift

agree. I was working on auto detect but got side tracked on zarr v3. I am surprised the xarray did not work I thought I had at least one test for it. Can you try ncdump -k to see if any useful info shows. Also print out the file. Metadata and post here. I may be able to spot a problem

DennisHeimbigner avatar Nov 21 '25 17:11 DennisHeimbigner

Ugh, I meant to include the generated zarr. I'll note that in the issue linked above, it's suggested that this is something that broke in 4.9.3.

test.zarr.zip

dopplershift avatar Nov 21 '25 22:11 dopplershift

that file is empty.

DennisHeimbigner avatar Nov 22 '25 03:11 DennisHeimbigner

😱 And today I learned that you have to opt-in with zip to have it zip everything in a directory. sigh

I updated the link in my comment above to have a corrected zip.

dopplershift avatar Dec 01 '25 19:12 dopplershift

Ok, the reason it fails is because the file uses the U dtype which PY_UNICODE, which apparently deprecated as of python 3.12. And as of python 3.15, utf-8 is the default. The Zarr V2 spec has not AFAIK been changed to correspond. Frankly I am not sure what kind of hack I should use to support PY_UNICODE. Suggestions welcome.

DennisHeimbigner avatar Dec 01 '25 23:12 DennisHeimbigner

This issue partly addressed by PR https://github.com/Unidata/netcdf-c/pull/3218 As noted the real problem here is the use of dtype <U1.

DennisHeimbigner avatar Dec 07 '25 18:12 DennisHeimbigner