VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Failing test to due to upstream indexer changes

Open maxrjones opened this issue 6 months ago • 1 comments
trafficstars

virtualizarr/tests/test_xarray.py::TestConcat::test_concat_along_new_dim started failing after https://github.com/pydata/xarray/pull/10277 was merged.

maxrjones avatar May 15 '25 20:05 maxrjones

this actually looks like a bug in virtualizarr. The problematic part is this: https://github.com/zarr-developers/VirtualiZarr/blob/43209b32b8d84f7f57f91942c5b8d3b6cf2db0f4/virtualizarr/manifests/array.py#L241-L247 where the error message says that None would be fine as a tuple element but the condition checks for dim_indexer is None. I'd rewrite the condition to

if not isinstance(dim_indexer, (int, slice, np.ndarray)) and dim_indexer not in (None, Ellipsis):

(could also just change the second part to and dim_indexer is not None, but Ellipsis is also a singleton)

keewis avatar May 17 '25 20:05 keewis

With xarray just being released, I think we need to address this soon. I tried the suggestion above and it didn't work as-is..

maxrjones avatar Jun 10 '25 19:06 maxrjones

There's more work needed following up on #612. For example, this fails:

import xarray as xr
import warnings
warnings.filterwarnings("ignore",
                       message="Numcodecs codecs are not in the Zarr version 3 specification*",
                       category=UserWarning)
xr.set_options(display_style="html")

from urllib.parse import urlparse

import obstore as obs
import xarray as xr

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers.hdf.hdf import Parser as HDFParser

file_urls = [
    "s3://smn-ar-wrf/DATA/WRF/DET/2022/12/31/12/WRFDETAR_01H_20221231_12_000.nc",
    "s3://smn-ar-wrf/DATA/WRF/DET/2022/12/31/12/WRFDETAR_01H_20221231_12_001.nc",
]

parsed = urlparse(file_urls[0])
s3store = obs.store.S3Store(
    bucket=parsed.netloc,
    client_options={"allow_http": True},
    skip_signature=True,
    virtual_hosted_style_request=False,
    region="us-west-2",
)
virtual_datasets = [
    open_virtual_dataset(file_url=url, parser=HDFParser(), object_store=s3store)
    for url in file_urls
]

virtual_ds = xr.concat(virtual_datasets, dim='time', coords='minimal', compat='override')
print(repr(virtual_ds))
 Traceback (most recent call last):
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
     exec_python(code, code_block_id, exec_globals)
     ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
     exec(compiled, exec_globals)  # noqa: S102
     ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
   File "<code block: session intro; n14>", line 27, in <module>
     virtual_ds = xr.concat(virtual_datasets, dim='time', coords='minimal', compat='override')
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 277, in concat
     return _dataset_concat(
         objs,
     ...<8 lines>...
         create_index_for_new_dim=create_index_for_new_dim,
     )
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 676, in _dataset_concat
     combined_var = concat_vars(
         vars, dim_name, positions, combine_attrs=combine_attrs
     )
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/core/variable.py", line 3018, in concat
     variables = list(variables)
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/structure/concat.py", line 598, in ensure_common_dims
     var = var.set_dims(common_dims, common_shape)
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/util/deprecation_helpers.py", line 143, in wrapper
     return func(*args, **kwargs)
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.pixi/envs/docs/lib/python3.13/site-packages/xarray/core/variable.py", line 1395, in set_dims
     expanded_data = self.data[indexer]
                     ~~~~~~~~~^^^^^^^^^
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/virtualizarr/manifests/array.py", line 261, in __getitem__
     indexer = _possibly_expand_trailing_ellipsis(indexer, self.ndim)
   File "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/virtualizarr/manifests/array.py", line 363, in _possibly_expand_trailing_ellipsis
     extra_slices_needed = ndim - (len(indexer) - 1)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ValueError: Invalid indexer for array with ndim=0: (None, Ellipsis)

The ndim=0 is very suspicious.

maxrjones avatar Jun 16 '25 19:06 maxrjones

~~The snippet above works if I use just dim="time" dropping the coords and compat arguments. Since the code previously worked with those options, it'd be helpful to get a sanity check from @TomNicholas about the expected behavior.~~ Scratch that, it does still require the compat and coords arguments to avoid the missing chunk manager error reported elsewhere.

maxrjones avatar Jun 16 '25 19:06 maxrjones

So does this work? Can this be closed?

TomNicholas avatar Jun 17 '25 08:06 TomNicholas

So does this work? Can this be closed?

It does not work.

maxrjones avatar Jun 17 '25 11:06 maxrjones

I am also hitting this issue with latest release of xarray. :(

groutr avatar Jun 25 '25 14:06 groutr

#612 and #641 should now cover both the indexing cases introduced by https://github.com/pydata/xarray/pull/10277.

TomNicholas avatar Jul 04 '25 18:07 TomNicholas