intake-xarray icon indicating copy to clipboard operation
intake-xarray copied to clipboard

Break up intake-xarray into separate drivers?

Open danielballan opened this issue 5 years ago • 6 comments

Intake-xarray lumps together several different I/O backends. Grouping drivers by what they return (xarray in this case) is different from the usual pattern. We propose to break intake-xarray into intake-netcdf, intake-tiff with rasterio support (and maybe also tifffile support?), intake-png, and intake-zarr. We can maintain intake-xarray going forward as a metapackage that depends on these as a convenience and for back-compat.

danielballan avatar Feb 28 '20 16:02 danielballan

You may well have a point, but the original rationale was: that all these call xarray loader function

martindurant avatar Feb 28 '20 16:02 martindurant

Yes, I think that is sensible early on, just as we currently package a bunch of unrelated drivers together in databroker._drivers with the intention of splitting them up once things stabilize. Maybe this is worth doing eventually, with some shared utility library containing the xarray loader.

danielballan avatar Feb 28 '20 17:02 danielballan

Seems fine to me...

jbednar avatar Mar 10 '20 18:03 jbednar

Happy to write it up as a separate issue, but if this would help load zarrs which were not written with xarray, I'd also be in favor.

I'm currently running into a KeyError on _ARRAY_DIMENSIONS.

Stacktrace
Traceback (most recent call last):
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 163, in _get_zarr_dims_and_attrs
    dimensions = zarr_obj.attrs[dimension_key]
  File "/Users/jamoore/opt/zarr/zarr/attrs.py", line 64, in __getitem__
    return self.asdict()[item]
KeyError: '_ARRAY_DIMENSIONS'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "intake_test.py", line 3, in <module>
    ds = cat.idr_ebi_6001240.to_dask()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 69, in to_dask
    return self.read_chunked()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 44, in read_chunked
    self._load_metadata()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake/source/base.py", line 117, in _load_metadata
    self._schema = self._get_schema()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/base.py", line 18, in _get_schema
    self._open_dataset()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/intake_xarray/xzarr.py", line 31, in _open_dataset
    self._ds = xr.open_zarr(self._mapper, **self.kwargs)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 599, in open_zarr
    ds = maybe_decode_store(zarr_store)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 582, in maybe_decode_store
    drop_variables=drop_variables,
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/conventions.py", line 570, in decode_cf
    vars, attrs = obj.load()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/common.py", line 123, in load
    (_decode_variable_name(k), v) for k, v in self.get_variables().items()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 290, in get_variables
    (k, self.open_store_variable(k, v)) for k, v in self.ds.arrays()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/core/utils.py", line 402, in FrozenDict
    return Frozen(dict(*args, **kwargs))
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 290, in <genexpr>
    (k, self.open_store_variable(k, v)) for k, v in self.ds.arrays()
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 274, in open_store_variable
    dimensions, attributes = _get_zarr_dims_and_attrs(zarr_array, _DIMENSION_KEY)
  File "/opt/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/zarr.py", line 167, in _get_zarr_dims_and_attrs
    "required for xarray to determine variable dimensions." % (dimension_key)
KeyError: 'Zarr object is missing the attribute `_ARRAY_DIMENSIONS`, which is required for xarray to determine variable dimensions.'

joshmoore avatar Apr 29 '20 13:04 joshmoore

load zarrs which were not written with xarray

LIke intake.source.zarr.ZarrArraySource ("ndzarr")?

martindurant avatar Apr 29 '20 13:04 martindurant

Thanks, @martindurant. ZarrArraySource does work for my data.

joshmoore avatar Apr 29 '20 14:04 joshmoore