cate Support BIOMASS dataset in Zarr Data Store

The Zarr Data Store contains data from the ODP that has been converted into the zarr format. There is one BIOMASS dataset in the Zarr data store. For the purposes of this issue, we say that a dataset is supported when

it can be opened in cate
it can be opened in cate with a spatial subset
its content can be written to disk
its data can be displayed in cate

The BIOMASS dataset cannot be opened with a spatial subset. The traceback is:

[2021-04-29 08:49:29] Request: open_dataset(datasetid=ESACCI-BIOMASS-L4-AGB-MERGED-100m-2010-2018-fv2.0.zarr, time_range=('2017-01-01', '2017-01-01'), var_names=['agb', 'agb_se'], region=[123.5265, 60.20374, 123.52827, 60.20552])

Traceback (most recent call last): File "test_cci_data_support.py", line 327, in test_open_ds dataset, _ = open_dataset(dataset_id=data_id, File "/home/users/tfincke/Projects/cate/cate/core/ds.py", line 432, in open_dataset dataset = select_subset(dataset, **subset_args) File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 37, in select_subset dataset = select_spatial_subset(dataset, xy_bbox=bbox) File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 85, in select_spatial_subset geo_coding = geo_coding if geo_coding is not None else GeoCoding.from_dataset(dataset, xy_names=xy_names) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 132, in from_dataset return cls.from_xy((x, y), xy_names=(x_name, y_name)) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 169, in from_xy x, is_lon_normalized = _maybe_normalise_2d_lon(x) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 462, in _maybe_normalise_2d_lon if _is_crossing_antimeridian(lon_var): File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 457, in _is_crossing_antimeridian return abs(lon_var.diff(dim=dim_x)).max() > 180.0 or
File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataarray.py", line 3107, in diff ds = self._to_temp_dataset().diff(n=n, dim=dim, label=label) File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataset.py", line 5489, in diff variables[name] = var.isel(**kwargs_end) - var.isel(**kwargs_start) File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/variable.py", line 2301, in func f(self_data, other_data) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 475. GiB for an array with shape (157500, 404999) and data type float64

Apr 29 '21 09:04 TonioF

Should be fixed in cate 3.0 by https://github.com/dcs4cop/xcube/issues/442

Apr 30 '21 09:04 forman

Viewing the dataset will resilt in a DeveloperError: Width must be less than or equal to the maximum texture size (16384). Check maximumTextureSize. This error probably happens due to the massive size of the dataset (157500 * 405000)

Jul 19 '21 09:07 TonioF

This comment is invalid due to wrong url:

~~I see different errors:~~ ~~All three approaches result in the same error message (using zarr, xarray and xcube) with anonymous access:~~

~~ClientConnectorError: Cannot connect to host cci-ke-o.s3.jc.rl.ac.uk:80 ssl:default [Connect call failed ('172.17.2.151', 80)]~~

~~Or is that cube not publicly accessible yet?~~

Jul 20 '21 12:07 AliceBalfanz

When opening BIOMASS with newest xcube, it works fine with open_dataset:

from xcube.core.dsio import open_dataset, open_cube ds = open_dataset("https://cci-ke-o.s3-ext.jc.rl.ac.uk:8443/esacci/ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-1978-2020-fv05.3.zarr", s3_kwargs=dict(anon=True))

When opening it with open_cube an error occurs:

Jul 21 '21 07:07 AliceBalfanz

Mar 28 '22 09:03 forman