Slicing not supported with Indexer when trying to open many MUR files.
xref: https://github.com/nsidc/earthaccess/issues/903
I'm trying to open 10 years of high resolution MUR data using earthaccess, the upcoming example for 1 month (32 granules) works fine but when I try to open ~1000 I get the following error:
File /srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/core/indexing.py:1031, in apply_indexer(indexable, indexer)
1029 return indexable.vindex[indexer]
1030 elif isinstance(indexer, OuterIndexer):
-> 1031 return indexable.oindex[indexer]
1032 else:
1033 return indexable[indexer]
File [/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/core/indexing.py:369](https://openscapes.2i2c.cloud/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/core/indexing.py#line=368), in IndexCallable.__getitem__(self, key)
368 def __getitem__(self, key: Any) -> Any:
--> 369 return self.getter(key)
File [/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/core/indexing.py:1508](https://openscapes.2i2c.cloud/srv/conda/envs/notebook/lib/python3.11/site-packages/xarray/core/indexing.py#line=1507), in NumpyIndexingAdapter._oindex_get(self, indexer)
1506 def _oindex_get(self, indexer: OuterIndexer):
1507 key = _outer_to_numpy_indexer(indexer, self.array.shape)
-> 1508 return self.array[key]
File [/srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/manifests/array.py:214](https://openscapes.2i2c.cloud/srv/conda/envs/notebook/lib/python3.11/site-packages/virtualizarr/manifests/array.py#line=213), in ManifestArray.__getitem__(self, key)
212 return self
213 else:
--> 214 raise NotImplementedError(f"Doesn't support slicing with {indexer}")
NotImplementedError: Doesn't support slicing with (array([....]))
I'm not sure if there is something wrong with the data (misaligned grid in one granule or something like that). @TomNicholas mentioned that this may be related to #51
The full example(earthaccess has to be installed from source):
# NASA JPL Multiscale Ultrahigh Resolution (MUR) Sea Surface Temperature (SST) dataset - 0.01 degree resolution
results = []
for year in range(2010, 2020):
seasonal_granules = earthaccess.search_data(
temporal=(f"{year}-12", f"{year+1}-02"), short_name="MUR-JPL-L4-GLOB-v4.1",
cloud_hosted=True
)
results.extend(seasonal_granules)
len(results)
913
# and then
%%time
mur = earthaccess.open_virtual_mfdataset(
results,
access="indirect",
load=True, # it also failed with load=False
parallel=True,
concat_dim="time",
coords="minimal",
compat="override",
combine_attrs="drop_conflicts",
)
mur
I'm not sure why this operation requires indexing at all...
Can you open the files individually and then concatenate them to try to narrow down where / for which dataset the error is occurring?
So I did that approach and narrowed it down a bit. This code snippet uses the same results variable linked above by @betolink and reaches the same error with just two xr.Datasets. The main difference I found between the two datasets is that one has an extra variable (dt_1km_data) which is also shown below. I guess at some point the MUR netcdf product added a variable to the datasets? This presumably makes it difficult to concatenate with older references.
Also FYI I checked the source netcdfs and verified that one does have an extra variable to make sure this isn't a dmrpp reader error
import dask
import virtualizarr as vz
fs = earthaccess.get_s3_filesystem(results=results)
fs.storage_options["anon"] = False # type: ignore
open_ = dask.delayed(vz.open_virtual_dataset) # type: ignore
vdatasets = []
# Get list of virtual datasets (or dask delayed objects)
for g in results:
vdatasets.append(
open_(
filepath=g.data_links(access="direct")[0] + ".dmrpp",
filetype="dmrpp", # type: ignore
indexes={},
reader_options={"storage_options": fs.storage_options}, # type: ignore
)
)
vdatasets = dask.compute(vdatasets)[0]
xr.combine_nested(vdatasets[495:497], concat_dim="time",
coords="minimal",
compat="override",
combine_attrs="drop_conflicts")
print(vdatasets[495])
print(vdatasets[496])
<xarray.Dataset> Size: 5GB
Dimensions: (time: 1, lat: 17999, lon: 36000)
Coordinates:
time (time) int32 4B ManifestArray<shape=(1,), dtype=int32, ...
lat (lat) float32 72kB ManifestArray<shape=(17999,), dtype=...
lon (lon) float32 144kB ManifestArray<shape=(36000,), dtype...
Data variables:
mask (time, lat, lon) int8 648MB ManifestArray<shape=(1, 179...
sea_ice_fraction (time, lat, lon) int8 648MB ManifestArray<shape=(1, 179...
dt_1km_data (time, lat, lon) int8 648MB ManifestArray<shape=(1, 179...
analysed_sst (time, lat, lon) int16 1GB ManifestArray<shape=(1, 1799...
analysis_error (time, lat, lon) int16 1GB ManifestArray<shape=(1, 1799...
Attributes: (12/47)
Conventions: CF-1.5
title: Daily MUR SST, Final product
summary: A merged, multi-sensor L4 Foundation SST anal...
references: http://podaac.jpl.nasa.gov/Multi-scale_Ultra-...
institution: Jet Propulsion Laboratory
history: created at nominal 4-day latency; replaced nr...
... ...
<xarray.Dataset> Size: 4GB
Dimensions: (time: 1, lat: 17999, lon: 36000)
Coordinates:
time (time) int32 4B ManifestArray<shape=(1,), dtype=int32, ...
lat (lat) float32 72kB ManifestArray<shape=(17999,), dtype=...
lon (lon) float32 144kB ManifestArray<shape=(36000,), dtype...
Data variables:
mask (time, lat, lon) int8 648MB ManifestArray<shape=(1, 179...
sea_ice_fraction (time, lat, lon) int8 648MB ManifestArray<shape=(1, 179...
analysed_sst (time, lat, lon) int16 1GB ManifestArray<shape=(1, 1799...
analysis_error (time, lat, lon) int16 1GB ManifestArray<shape=(1, 1799...
Attributes: (12/47)
Conventions: CF-1.5
title: Daily MUR SST, Final product
summary: A merged, multi-sensor L4 Foundation SST anal...
references: http://podaac.jpl.nasa.gov/Multi-scale_Ultra-...
institution: Jet Propulsion Laboratory
history: created at nominal 4-day latency; replaced nr...
... ...
haha I knew this was an issue with model output (variables can appear and disappear) but it's news to me that it can happen with nominally-standard-archival data. FWIW I think Xarray will try to reindex dt_1km_data to the full time vector by indexing with [-1, -1, -1, -1 ..... N-1, N] which is why you get the error.
@dcherian Yup, the error output if I change the range of results to [545:555] is:
NotImplementedError: Doesn't support slicing with (array([-1, -1, -1, 0, 1, 2, 3, 4, 5, 6]), slice(None, None, None), slice(None, None, None))
Huh. I'm not sure I fully understand yet - is this an example of trying to auto-generate indexes? Or an xarray "virtual variable" thing (not virtual variable in the virtualizarr sense).