xarray
xarray copied to clipboard
array api-related upstream-dev failures
- [x] towards #8844
This "fixes" the upstream-dev failures related to the removal of numpy.array_api
. There are a couple of open questions, though:
-
array-api-strict
is not installed by default, sonamedarray
would get a new dependency. Not sure how to deal with that – as far as I can tell,numpy.array_api
was not supposed to be used that way, so maybe we need to usearray-api-compat
instead? What do you think, @andersy005, @Illviljan? -
array-api-strict
does not defineArray.nbytes
(causing a funny exception that wrongly claimsDataArray
does not definenbytes
) -
array-api-strict
has a differentDType
class, which makes it tricky to work with bothnumpy
dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes usingisdtype
, butnumpy.isdtype
will only exist innumpy>=2
,array-api-strict
's version does not define datetime / string / object dtypes, andnumpy.issubdtype
does not work with the non-numpy
dtype class). So maybe we need to usearray-api-compat
internally?
My intention with the tests in namedarray were intended to be using the strict array, so I think array-api-strict
makes sense there. So if I failed to use the strict version properly we should fix those.
It should not be required for anything else than testing though.
> arrayapi_a = nxp.asarray([2.1, 4], dtype=np.dtype(np.int64))
/home/runner/work/xarray/xarray/xarray/tests/test_namedarray.py:364:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/runner/micromamba/envs/xarray-tests/lib/python3.12/site-packages/array_api_strict/_creation_functions.py:55: in asarray
_check_valid_dtype(dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dtype = dtype('int64')
def _check_valid_dtype(dtype):
# Note: Only spelling dtypes as the dtype objects is supported.
if dtype not in (None,) + _all_dtypes:
> raise ValueError("dtype must be one of the supported dtypes")
E ValueError: dtype must be one of the supported dtypes
These should probably just be nxp.asarray([2.1, 4], dtype=nxp.int64)
?
https://github.com/data-apis/array-api-strict/blob/b52232c13b39d7b081e47b7bb1a0ef3dcbd1a6ff/array_api_strict/_dtypes.py#L45
my question was about the import of numpy.array_api
in xarray.namedarray._array_api
. I removed it, but I didn't check whether nxp
is used (as far as I can tell, only the doctests fail?)
Edit: I'll apply your suggestion tomorrow
my question was about the import of
numpy.array_api
inxarray.namedarray._array_api
. I removed it, but I didn't check whethernxp
is used (as far as I can tell, only the doctests fail?)
Ah the doctests, the idea was to use an array api compliant array in the array api module. Previously numpy couldn't sometimes pass through the __array_namespace__
path so it couldn't be tested properly.
But now numpy should be array api compliant so I think it is safe to replace nxp
with np
in the doctests.
https://github.com/numpy/numpy/pull/25911
unless I missed any (which is definitely possible), I've got it down to 3 unique errors:
- we're using
dtype.kind
to check for "nullable" dtypes (float
,complex
,object
). The array api says we should usexp.isdtype
for this, but that does not exist innumpy<2
. So we'd have to dispatch to eithernp.issubdtype
orxp.isdtype
depending on availability. However, the Array API does not have object, string or datetime / timedelta dtypes (maybe more), so we can't really search for these using the Array API, which makes compatibility code tricky. - the Array API has neither
array.nbytes
nordtype.itemsize
. We've been using the latter to compute the number of bytes in the array in the absence ofnbytes
. As mentioned above, theAttributeError
foritemsize
makes the error message claim thatDataArray
(orDataset
) don't definebytes
. -
result_type
currently does not at all dispatch to the Array API at all. This should be easy to fix in comparison, so I'll get to that later this week.
Edit: I'd appreciate input on how to best resolve 1 and 2
For 2 you can calculate bytes from bits with xp.iinfo(dtype).bits // 8
For 2 you can calculate bytes from bits with
xp.iinfo(dtype).bits // 8
that works, although for floating dtypes there's xp.finfo
. I've pushed an implementation, which should probably be refactored (I couldn't find examples on how to do this, so I'm accessing __array_namespace__
directly).
yeah, here's my version I was playing around with:
import array_api_strict as xp
a = xp.asarray([1, 2, 3], dtype=xp.int8)
def itemsize(x) -> int:
"""
Length of one array element in bytes.
Parameters
----------
x :
array to get the itemsize of.
Examples
--------
>>> import array_api_strict as xp
Bool
>>> x = xp.asarray([0,1,2], dtype=xp.bool)
>>> itemsize(xp.astype(x, xp.bool))
1
Integer
>>> itemsize(xp.astype(x, xp.int8))
1
>>> itemsize(xp.astype(x, xp.int16))
2
>>> itemsize(xp.astype(x, xp.int32))
4
>>> itemsize(xp.astype(x, xp.int64))
8
Floating
>>> itemsize(xp.astype(x, xp.float32))
4
>>> itemsize(xp.astype(x, xp.float64))
8
Floating complex
>>> itemsize(xp.astype(x, xp.complex64))
4
>>> itemsize(xp.astype(x, xp.complex128))
8
"""
xp = x.__array_namespace__() if hasattr(x, "__array_namespace__") else np
dtype = x.dtype
if xp.isdtype(dtype, kind="bool"):
return 1
elif xp.isdtype(dtype, kind="integral"):
return xp.iinfo(dtype).bits // 8
else:
return xp.finfo(dtype).bits // 8
To get xp
you can use:
https://github.com/pydata/xarray/blob/1d43672574332615f225089d69f95a9f8d81d912/xarray/namedarray/_array_api.py#L33
the last round of commits should fix the final issue. I'm not at all sure my implementation is the most elegant it can be, but it at least it works (for now)
Edit: I'll integrate _get_data_namespace
and think about what to do with itemsize
some time later today.
This test should fail with xp-arrays: https://github.com/pydata/xarray/blob/84f0c9575692c38d3ae701ed00cc5d3bb4fc93e1/xarray/tests/test_array_api.py#L108-L111
Looks like a typo: assert xp_arr.nbytes == xp_arr.data.nbytes
I suppose you can use with pytest.raises
for row 111 if you don't have a better idea.
Looks like a typo:
assert xp_arr.nbytes == xp_arr.data.nbytes
This wouldn't work, because xp_arr.data
does not define nbytes
. So as far as I can tell, numpy
's answer is considered the ground truth, and everything else is compared against that.
Yes, I agree. What I meant was something like this:
def test_properties(arrays: tuple[xr.DataArray, xr.DataArray]) -> None:
np_arr, xp_arr = arrays
assert np_arr.nbytes == np_arr.data.nbytes
assert xp_arr.nbytes == np_arr.data.nbytes
with pytest.raises(AttributeError):
# xp arrays doesn't support nbytes, should fail:
assert xp_arr.nbytes == xp_arr.data.nbytes
numpy arrays are already checked on the first assert.
edit: comparing that nbytes is correctly calculated from xp_arr is good. But it is nice to clarify that xp_arr.data.nbytes
should fail as expected.
but then this should be enough, no?
with pytest.raises(AttributeError):
# xp arrays doesn't support nbytes, should fail:
xp_arr.data.nbytes
We know xp_arr.nbytes
doesn't raise since we just called it successfully in the line above.
Sure that works too!
My idea was that assert xp_arr.nbytes == xp_arr.data.nbytes
is what we really want to check. But using np_arr is a little workaround we do instead.
I opened data-apis/array-api#789 to discuss adding nbytes
(in some form) to the array API.
My idea was that
assert xp_arr.nbytes == xp_arr.data.nbytes
is what we really want to check.
I've thought about this some more, and I agree that in general that would be true. However, in the case of the tests, where the arrays are created to be exactly the same (one just uses a different API), I believe it would be better to clearly communicate the intention:
expected = np_arr.data.nbytes
assert np_arr.nbytes == expected
assert xp_arr.nbytes == expected
@shoyer, could I ask for another review? I think I changed the code to what we discussed yesterday, but I might have missed something.
the remaining two errors on the upstream-dev CI are due to a bug in numpy.isdtype
(numpy/numpy#26441). This appears to only have been an issue on numpy
's main
, not their 2.0
branch / the rc, so this should work as soon as the fix is uploaded to the nightly wheels.
looks like this should finally be ready for merging?
:clap: 🫡