array-api
array-api copied to clipboard
RFC: `item()` to return scalar for arrays with exactly 1 element.
def item(self) -> Scalar:
"""If array contains exactly one element, retun it as a scalar, else raises ValueError."""
Examples:
numpy.ndarray.itemtorch.Tensor.itempandas.Series.itempandas.Index.itempolars.Series.itemxarray.DataArray.item
Demo:
import pytest
import xarray as xr
import pandas as pd
import polars as pl
import numpy as np
@pytest.mark.parametrize("data", [[], [1, 2, 3]])
@pytest.mark.parametrize(
"array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item_valueerror(data, array_type):
array = array_type(data)
with pytest.raises(ValueError):
array.item()
@pytest.mark.parametrize(
"array_type", [torch.tensor, np.array, pd.Series, pd.Index, pl.Series, xr.DataArray]
)
def test_item(array_type):
array = array_type([1])
array.item()
Currently, only torch fails, because it raises RuntimeError instead of ValueError.
This was discussed in #710 , along with the more general to_list, which works also for ND arrays.
item() is a bit different from to_list, and honestly I find it confusing that a method named to_list can return something that is not a list.
.item() is more constrained than to_list indeed, and a bit cleaner. I checked other libraries - NumPy, PyTorch, JAX and CuPy implement .item(), Dask does not. (TF doesn't have it in the docs, so probably also not - but I can't check). CuPy/JAX do the transfer to CPU if the ndarray is on GPU.
This is a minor convenience method though, since float() & co work as well. They are clearer, since type-stable, and it also work for Dask. The only downside is that if you want some dtype-generic implementation to return a single element, you have to write a little utility for it to call int/float/complex/bool as appropriate. Something like:
def as_pyscalar(x):
if xp.isdtype(x, 'real floating'):
return float(x)
elif xp.isdtype(x, 'complex floating'):
return complex(x)
elif xp.isdtype(x, 'integral'):
return int(x)
elif xp.isdtype(x, 'bool'):
return bool(x)
else:
# raise error, or handle custom/non-standard dtypes if desired
Static typing of such a function, and of .item(), would also be a little annoying as it requires overloads.
item also works on arrays with multiple dimensions, whereas we decided to make it so float does not.
>>> np.array([1]).item()
1
We discussed this in a call today, and concluded that this fell into a bucket of functionality that is useful, but also easy to implement on top of what's already in the standard. In addition, there are problems with trying to add this: a item() method is hard, because it's missing in some libraries and missing methods cannot be worked around in array-api-compat. If we'd do this, a function would be the way to go - but since that's not present in any libraries, it'd be new - hence more work, and likely to incur resistance from array library maintainers.
Outcome:
- Create the
array-api-extrapackage where this kind of function can live, and add it there (probably asas_pyscalaror a similarly descriptive name, not asitem) - Only reconsider adding it to the standard itself in the future if most/all array libraries have already added that function.
On a very fundamental level, I believe .item() makes no sense on DataFrame-like objects (pandas.DataFrame, polars.DataFrame, pyarrow.Table, etc.) because these are designed to represent heterogeneous data types.
From a mathematical PoV, item() acts on array-like data with homogeneous type, as a representation of the natural isomorphism V →K, when V is a 1-dimensional vector space over K.
Is this usage guaranteed?
If so, should it be added somewhere to the specification? I looked for it here.
FWIW I also like the item method since it's all I've ever needed and it's simpler than tolist. I wonder if it should be on the array namespace rather than the array: (def item(x: Array, /) -> complex | bool) since it can be implemented using the array's public interface. (This is a common test in OO design for what should be a method versus a bare function.)
Yes, __float__ and so on are guaranteed (modulo the "lazy" note). See https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.float.html#array_api.array.float. Though Ralf's helper should also include a if x.ndim != 1 or x.size != 1: raise ValueError check.