VirtualiZarr
VirtualiZarr copied to clipboard
Failing test to due to upstream indexer changes
virtualizarr/tests/test_xarray.py::TestConcat::test_concat_along_new_dim started failing after https://github.com/pydata/xarray/pull/10277 was merged.
this actually looks like a bug in virtualizarr. The problematic part is this: https://github.com/zarr-developers/VirtualiZarr/blob/43209b32b8d84f7f57f91942c5b8d3b6cf2db0f4/virtualizarr/manifests/array.py#L241-L247 where the error message says that None would be fine as a tuple element but the condition checks for dim_indexer is None. I'd rewrite the condition to
if not isinstance(dim_indexer, (int, slice, np.ndarray)) and dim_indexer not in (None, Ellipsis):
(could also just change the second part to and dim_indexer is not None, but Ellipsis is also a singleton)
With xarray just being released, I think we need to address this soon. I tried the suggestion above and it didn't work as-is..
There's more work needed following up on #612. For example, this fails:
import xarray as xr
import warnings
warnings.filterwarnings("ignore",
message="Numcodecs codecs are not in the Zarr version 3 specification*",
category=UserWarning)
xr.set_options(display_style="html")
from urllib.parse import urlparse
import obstore as obs
import xarray as xr
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers.hdf.hdf import Parser as HDFParser
file_urls = [
"s3://smn-ar-wrf/DATA/WRF/DET/2022/12/31/12/WRFDETAR_01H_20221231_12_000.nc",
"s3://smn-ar-wrf/DATA/WRF/DET/2022/12/31/12/WRFDETAR_01H_20221231_12_001.nc",
]
parsed = urlparse(file_urls[0])
s3store = obs.store.S3Store(
bucket=parsed.netloc,
client_options={"allow_http": True},
skip_signature=True,
virtual_hosted_style_request=False,
region="us-west-2",
)
virtual_datasets = [
open_virtual_dataset(file_url=url, parser=HDFParser(), object_store=s3store)
for url in file_urls
]
virtual_ds = xr.concat(virtual_datasets, dim='time', coords='minimal', compat='override')
print(repr(virtual_ds))
Traceback (most recent call last):
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session intro; n14>", line 27, in <module>
virtual_ds = xr.concat(virtual_datasets, dim='time', coords='minimal', compat='override')
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 277, in concat
return _dataset_concat(
objs,
...<8 lines>...
create_index_for_new_dim=create_index_for_new_dim,
)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 676, in _dataset_concat
combined_var = concat_vars(
vars, dim_name, positions, combine_attrs=combine_attrs
)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/core/variable.py", line 3018, in concat
variables = list(variables)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 598, in ensure_common_dims
var = var.set_dims(common_dims, common_shape)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/util/deprecation_helpers.py", line 143, in wrapper
return func(*args, **kwargs)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/core/variable.py", line 1395, in set_dims
expanded_data = self.data[indexer]
~~~~~~~~~^^^^^^^^^
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/virtualizarr/manifests/array.py", line 261, in __getitem__
indexer = _possibly_expand_trailing_ellipsis(indexer, self.ndim)
File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/virtualizarr/manifests/array.py", line 363, in _possibly_expand_trailing_ellipsis
extra_slices_needed = ndim - (len(indexer) - 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Invalid indexer for array with ndim=0: (None, Ellipsis)
The ndim=0 is very suspicious.
~~The snippet above works if I use just dim="time" dropping the coords and compat arguments. Since the code previously worked with those options, it'd be helpful to get a sanity check from @TomNicholas about the expected behavior.~~
Scratch that, it does still require the compat and coords arguments to avoid the missing chunk manager error reported elsewhere.
So does this work? Can this be closed?
So does this work? Can this be closed?
It does not work.
I am also hitting this issue with latest release of xarray. :(
#612 and #641 should now cover both the indexing cases introduced by https://github.com/pydata/xarray/pull/10277.