xarray array api-related upstream-dev failures

[x] towards #8844

This "fixes" the upstream-dev failures related to the removal of numpy.array_api. There are a couple of open questions, though:

array-api-strict is not installed by default, so namedarray would get a new dependency. Not sure how to deal with that – as far as I can tell, numpy.array_api was not supposed to be used that way, so maybe we need to use array-api-compat instead? What do you think, @andersy005, @Illviljan?
array-api-strict does not define Array.nbytes (causing a funny exception that wrongly claims DataArray does not define nbytes)
array-api-strict has a different DType class, which makes it tricky to work with both numpy dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes using isdtype, but numpy.isdtype will only exist in numpy>=2, array-api-strict's version does not define datetime / string / object dtypes, and numpy.issubdtype does not work with the non-numpy dtype class). So maybe we need to use array-api-compat internally?

Mar 19 '24 13:03 keewis

My intention with the tests in namedarray were intended to be using the strict array, so I think array-api-strict makes sense there. So if I failed to use the strict version properly we should fix those.

It should not be required for anything else than testing though.

>       arrayapi_a = nxp.asarray([2.1, 4], dtype=np.dtype(np.int64))

/home/runner/work/xarray/xarray/xarray/tests/test_namedarray.py:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/runner/micromamba/envs/xarray-tests/lib/python3.12/site-packages/array_api_strict/_creation_functions.py:55: in asarray
    _check_valid_dtype(dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dtype = dtype('int64')

    def _check_valid_dtype(dtype):
        # Note: Only spelling dtypes as the dtype objects is supported.
        if dtype not in (None,) + _all_dtypes:
>           raise ValueError("dtype must be one of the supported dtypes")
E           ValueError: dtype must be one of the supported dtypes

These should probably just be nxp.asarray([2.1, 4], dtype=nxp.int64) ? https://github.com/data-apis/array-api-strict/blob/b52232c13b39d7b081e47b7bb1a0ef3dcbd1a6ff/array_api_strict/_dtypes.py#L45

Mar 21 '24 20:03 Illviljan

my question was about the import of numpy.array_api in xarray.namedarray._array_api. I removed it, but I didn't check whether nxp is used (as far as I can tell, only the doctests fail?)

Edit: I'll apply your suggestion tomorrow

Mar 21 '24 23:03 keewis

my question was about the import of numpy.array_api in xarray.namedarray._array_api. I removed it, but I didn't check whether nxp is used (as far as I can tell, only the doctests fail?)

Ah the doctests, the idea was to use an array api compliant array in the array api module. Previously numpy couldn't sometimes pass through the __array_namespace__ path so it couldn't be tested properly. But now numpy should be array api compliant so I think it is safe to replace nxp with np in the doctests. https://github.com/numpy/numpy/pull/25911

Mar 22 '24 06:03 Illviljan

unless I missed any (which is definitely possible), I've got it down to 3 unique errors:

we're using dtype.kind to check for "nullable" dtypes (float, complex, object). The array api says we should use xp.isdtype for this, but that does not exist in numpy<2. So we'd have to dispatch to either np.issubdtype or xp.isdtype depending on availability. However, the Array API does not have object, string or datetime / timedelta dtypes (maybe more), so we can't really search for these using the Array API, which makes compatibility code tricky.
the Array API has neither array.nbytes nor dtype.itemsize. We've been using the latter to compute the number of bytes in the array in the absence of nbytes. As mentioned above, the AttributeError for itemsize makes the error message claim that DataArray (or Dataset) don't define bytes.
result_type currently does not at all dispatch to the Array API at all. This should be easy to fix in comparison, so I'll get to that later this week.

Edit: I'd appreciate input on how to best resolve 1 and 2

Apr 10 '24 21:04 keewis

For 2 you can calculate bytes from bits with xp.iinfo(dtype).bits // 8

Apr 11 '24 20:04 Illviljan

For 2 you can calculate bytes from bits with xp.iinfo(dtype).bits // 8

that works, although for floating dtypes there's xp.finfo. I've pushed an implementation, which should probably be refactored (I couldn't find examples on how to do this, so I'm accessing __array_namespace__ directly).

Apr 12 '24 17:04 keewis

yeah, here's my version I was playing around with:

import array_api_strict as xp

a = xp.asarray([1, 2, 3], dtype=xp.int8)


def itemsize(x) -> int:
    """
    Length of one array element in bytes.

    Parameters
    ----------
    x :
        array to get the itemsize of.

    Examples
    --------
    >>> import array_api_strict as xp

    Bool

    >>> x = xp.asarray([0,1,2], dtype=xp.bool)
    >>> itemsize(xp.astype(x, xp.bool))
    1

    Integer

    >>> itemsize(xp.astype(x, xp.int8))
    1
    >>> itemsize(xp.astype(x, xp.int16))
    2
    >>> itemsize(xp.astype(x, xp.int32))
    4
    >>> itemsize(xp.astype(x, xp.int64))
    8

    Floating

    >>> itemsize(xp.astype(x, xp.float32))
    4
    >>> itemsize(xp.astype(x, xp.float64))
    8

    Floating complex

    >>> itemsize(xp.astype(x, xp.complex64))
    4
    >>> itemsize(xp.astype(x, xp.complex128))
    8
    """
    xp = x.__array_namespace__() if hasattr(x, "__array_namespace__") else np
    dtype = x.dtype

    if xp.isdtype(dtype, kind="bool"):
        return 1
    elif xp.isdtype(dtype, kind="integral"):
        return xp.iinfo(dtype).bits // 8
    else:
        return xp.finfo(dtype).bits // 8

To get xp you can use: https://github.com/pydata/xarray/blob/1d43672574332615f225089d69f95a9f8d81d912/xarray/namedarray/_array_api.py#L33

Apr 12 '24 17:04 Illviljan

the last round of commits should fix the final issue. I'm not at all sure my implementation is the most elegant it can be, but it at least it works (for now)

Edit: I'll integrate _get_data_namespace and think about what to do with itemsize some time later today.

Apr 12 '24 18:04 keewis

This test should fail with xp-arrays: https://github.com/pydata/xarray/blob/84f0c9575692c38d3ae701ed00cc5d3bb4fc93e1/xarray/tests/test_array_api.py#L108-L111

Looks like a typo: assert xp_arr.nbytes == xp_arr.data.nbytes

I suppose you can use with pytest.raises for row 111 if you don't have a better idea.

Apr 14 '24 15:04 Illviljan

Looks like a typo: assert xp_arr.nbytes == xp_arr.data.nbytes

This wouldn't work, because xp_arr.data does not define nbytes. So as far as I can tell, numpy's answer is considered the ground truth, and everything else is compared against that.

Apr 14 '24 17:04 keewis

Yes, I agree. What I meant was something like this:

 def test_properties(arrays: tuple[xr.DataArray, xr.DataArray]) -> None: 
     np_arr, xp_arr = arrays 
     assert np_arr.nbytes == np_arr.data.nbytes 
     assert xp_arr.nbytes == np_arr.data.nbytes 
     with pytest.raises(AttributeError):
         # xp arrays doesn't support nbytes, should fail:
         assert xp_arr.nbytes == xp_arr.data.nbytes

numpy arrays are already checked on the first assert.

edit: comparing that nbytes is correctly calculated from xp_arr is good. But it is nice to clarify that xp_arr.data.nbytes should fail as expected.

Apr 14 '24 20:04 Illviljan

but then this should be enough, no?

    with pytest.raises(AttributeError):
         # xp arrays doesn't support nbytes, should fail:
         xp_arr.data.nbytes

We know xp_arr.nbytes doesn't raise since we just called it successfully in the line above.

Apr 14 '24 20:04 keewis

Sure that works too!

My idea was that assert xp_arr.nbytes == xp_arr.data.nbytes is what we really want to check. But using np_arr is a little workaround we do instead.

Apr 14 '24 20:04 Illviljan

I opened data-apis/array-api#789 to discuss adding nbytes (in some form) to the array API.

Apr 21 '24 16:04 keewis

My idea was that assert xp_arr.nbytes == xp_arr.data.nbytes is what we really want to check.

I've thought about this some more, and I agree that in general that would be true. However, in the case of the tests, where the arrays are created to be exactly the same (one just uses a different API), I believe it would be better to clearly communicate the intention:

expected = np_arr.data.nbytes
assert np_arr.nbytes == expected
assert xp_arr.nbytes == expected

Apr 22 '24 13:04 keewis

@shoyer, could I ask for another review? I think I changed the code to what we discussed yesterday, but I might have missed something.

May 16 '24 14:05 keewis

the remaining two errors on the upstream-dev CI are due to a bug in numpy.isdtype (numpy/numpy#26441). This appears to only have been an issue on numpy's main, not their 2.0 branch / the rc, so this should work as soon as the fix is uploaded to the nightly wheels.

May 16 '24 16:05 keewis

looks like this should finally be ready for merging?

May 21 '24 20:05 keewis

:clap: 🫡

May 21 '24 22:05 dcherian

xarray xarray copied to clipboard

array api-related upstream-dev failures

xarray
xarray copied to clipboard