Composite operations on chunked array types not dispatched through chunk manager
What happened?
The arkouda-xarray chunk manager (found here) is designed to allow Arkouda's distributed arrays to operate as chunked arrays in XArray. (Note Arkouda's array's also implement the array API, which was being used for XArray interoperability before the chunk manager was created).
When executing various operations on XArray DataArrays or DataSets that are backed by the Arkouda chunk manager, I noticed that they are not being routed through the chunk manager itself. For example, calling myArkoudaBackedArray.mean(axis="some-axis") does not get routed through ArkoudaManager.reduction. However, the operation is successful and produces the correct result, indicating that the array-api implementation of mean is being used instead (i.e., the arkouda-backed array is being treated as a duck array).
https://github.com/jeremiah-corrado/arkouda-xarray-seasonal-avgs-example/blob/main/README.md contains a full example where I observed the above behavior. It includes instructions to get Arkouda set up for XArray interoperability.
pinging @TomNicholas, since we discussed this the other day.
MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None
xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.8.0 zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.4 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 70.0.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.24.0 sphinx: None
looking at the docstring of the chunk manager base class, it appears this is not used for all reductions: https://github.com/pydata/xarray/blob/d26144db962d9d620c7f7c4e7398c2f6b3bf6d47/xarray/namedarray/parallelcompat.py#L386-L390
In other words, this seems to be expected behavior? I wonder if var.reduce(np.mean) would go through the reduction method, though.
In other words, this seems to be expected behavior?
Reminding myself of how this works, I think yes, this is expected. The array type knows how to do a chunked mean on a chunked array without any further guidance from xarray, so it's fine to just use the array API to call the correct implementation of mean.
I wonder if
var.reduce(np.mean)would go through the reduction method, though.
It won't (see NamedArray.reduce)
https://github.com/pydata/xarray/blob/cde720f21788a1dbff2bdb0775a56b3ee6da6b1a/xarray/namedarray/core.py#L914
but I also don't think it needs to either. reduce is really a special case of the more general reduction primitive. reduction accepts the func kwarg that reduce accepts, but also accepts combine_func and aggregate_func. Basically these allow tree-reductions made of alternating steps, rather than just the same function applied over and over. This comes from dask.array.reduction, and is needed in the chunkmanager because it is used in xarray (only once though apparently, in the implementation .first and .last methods).
https://github.com/pydata/xarray/blob/cde720f21788a1dbff2bdb0775a56b3ee6da6b1a/xarray/core/duck_array_ops.py#L795
If arkouda chose not to implement ArkoudaChunkManager.reduction I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.
So actually other than documenting / testing this a bit better I believe there is not actually a bug to be fixed here?
I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.
Okay that makes sense. Thank you both for digging into the implementation/docs to figure that out. I'll work on implementing reduction properly in Arkouda's chunk manager so that those methods work.
I believe there is not actually a bug to be fixed here?
I think that's correct