Support BIOMASS dataset in Zarr Data Store
The Zarr Data Store contains data from the ODP that has been converted into the zarr format. There is one BIOMASS dataset in the Zarr data store. For the purposes of this issue, we say that a dataset is supported when
- it can be opened in cate
- it can be opened in cate with a spatial subset
- its content can be written to disk
- its data can be displayed in cate
The BIOMASS dataset cannot be opened with a spatial subset. The traceback is:
[2021-04-29 08:49:29] Request: open_dataset(datasetid=ESACCI-BIOMASS-L4-AGB-MERGED-100m-2010-2018-fv2.0.zarr, time_range=('2017-01-01', '2017-01-01'), var_names=['agb', 'agb_se'], region=[123.5265, 60.20374, 123.52827, 60.20552])
Traceback (most recent call last):
File "test_cci_data_support.py", line 327, in test_open_ds
dataset, _ = open_dataset(dataset_id=data_id,
File "/home/users/tfincke/Projects/cate/cate/core/ds.py", line 432, in open_dataset
dataset = select_subset(dataset, **subset_args)
File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 37, in select_subset
dataset = select_spatial_subset(dataset, xy_bbox=bbox)
File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 85, in select_spatial_subset
geo_coding = geo_coding if geo_coding is not None else GeoCoding.from_dataset(dataset, xy_names=xy_names)
File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 132, in from_dataset
return cls.from_xy((x, y), xy_names=(x_name, y_name))
File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 169, in from_xy
x, is_lon_normalized = _maybe_normalise_2d_lon(x)
File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 462, in _maybe_normalise_2d_lon
if _is_crossing_antimeridian(lon_var):
File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 457, in _is_crossing_antimeridian
return abs(lon_var.diff(dim=dim_x)).max() > 180.0 or
File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataarray.py", line 3107, in diff
ds = self._to_temp_dataset().diff(n=n, dim=dim, label=label)
File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataset.py", line 5489, in diff
variables[name] = var.isel(**kwargs_end) - var.isel(**kwargs_start)
File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/variable.py", line 2301, in func
f(self_data, other_data)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 475. GiB for an array with shape (157500, 404999) and data type float64
Should be fixed in cate 3.0 by https://github.com/dcs4cop/xcube/issues/442
Viewing the dataset will resilt in a DeveloperError: Width must be less than or equal to the maximum texture size (16384). Check maximumTextureSize. This error probably happens due to the massive size of the dataset (157500 * 405000)
This comment is invalid due to wrong url:
~~I see different errors:~~ ~~All three approaches result in the same error message (using zarr, xarray and xcube) with anonymous access:~~
~~ClientConnectorError: Cannot connect to host cci-ke-o.s3.jc.rl.ac.uk:80 ssl:default [Connect call failed ('172.17.2.151', 80)]~~
~~Or is that cube not publicly accessible yet?~~
When opening BIOMASS with newest xcube, it works fine with open_dataset:
from xcube.core.dsio import open_dataset, open_cube
ds = open_dataset("https://cci-ke-o.s3-ext.jc.rl.ac.uk:8443/esacci/ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-1978-2020-fv05.3.zarr", s3_kwargs=dict(anon=True))

When opening it with open_cube an error occurs:

