array-api
array-api copied to clipboard
Run-time version of API
Hi, first time contributor, long-time fan. I've been following the development of the Array API and having discussions with my fellow Astropy maintainers about how Astropy's Quantity class can support the API and how this might impact our users. Overall we feel that the adoption of the API will be beneficial (preaching to the choir here) but we, or at least I, have a point of concern.
Right now NumPy serves as the primary source of mathematical functions: np.cos, etc. Through their interoperability protocols, like __array_function__ we can vendor custom array classes like Quantity and for users most of numpy just works. This API standard has a somewhat different paradigm: where astropy.units will need to have a Quantity namespace that defines ``cos`, etc. In some ways this is good: for Astropy, Quantity can more easily support non-numpy array objects by acting as a wrapper and forwarding actual computations to the wrapped array object. For example,
def cos(x: Quantity, /) -> Quantity:
unit = ...
return Quantity(x.__array_namespace__().cos(x.value), unit)
However, we worry the loss of a central dispatching library is a regression in ease of user experience. Yes numpy will continue to __array_(u)func(tion)__ forever/ a long time and so can still serve as a central dispatching library, but base numpy does not conform to the array API standard. So there will not be an array-API-conformant central dispatching library and users will need to uses non-conformant libraries, which seems at odds to the purpose of this standard.
The situation becomes more difficult for Astropy and users when we consider Astropy's number of other array-like objects for which we might want to implement the array API: table Columns, coordinate representations, Distributions. Users will need to import a new array namespace for each — e.g. from astropy.table import column_api as cp.
I'm sure this issue has come up in discussion before. To add to that discussion, I'd like to propose what came up in ours. We felt the best solution might be if array-api published a package just of the supported functions. These functions would be thin wrappers, dispatching the actual computations to the array namespace of the argument. We know that a new Array implementation is out of scope, but hopefully this suggestion is within scope. For example:
def cos(x: ArrayAPIConformant, /) -> ArrayAPIConformant:
return get_namespace(x).cos(x)
Now the following is possible:
>>> import array_api as ap
>>> import numpy.array_api as np
>>> x = np.linspace(1, 2, num=5)[:, None]
>>> x
Array([[1. ],
[2. ]], dtype=float64)
>>> np.cos(x) # specific library if a user knows the input type and wants speed
Array([[ 0.54030231],
[-0.41614684]], dtype=float64)
>>> ap.cos(x) # universal library for just ease of use.
Array([[ 0.54030231],
[-0.41614684]], dtype=float64)
For users this is especially convenient because they do not need to worry about the output types of non-standard functions. For example, a function
def angle_between(x1: ArrayAPIConformant | float, x2: ArrayAPIConformant | float) -> Quantity:
...
will return a Quantity regardless of the input type. Currently a user would have to switch array namespaces.
>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> xp = angle.__array_namespace()
>>> xp.cos(angle)
With the import array_api as ap that is unnecessary (though they might choose to if optimizing for speed).
>>> import array_api as ap
>>> import numpy.array_api as np
>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> ap.cos(angle)
Another aspect to this proposal is inclusion of a (preferably runtime-checkable) typing.Protocol of the Array API definition. In the above examples I called this ArrayAPIConformant.
In summary
To demonstrate this proposal I have created an example implementation of the 2021.12 version of the API at https://github.com/nstarman/array_api.
This library is just copying from here, adding the dispatch code, some static typing, a get_namespace function, and a run-time checkable set of Protocols: in particularArrayAPIConformant which makes it easy to check that an object conforms to the array-API.
Also, I mypyc compile the library for extra performance.
In keeping with the scope of this standard, I have not included array-creation functions like ones, though it may be better to instead just have them raise a NotImplementedError with a helpful error message.
If you are interested, I would be more than happy to submit PRs that add the dispatching, protocols, mypyc compilation, etc.
Regardless, I look forward to working with you to address these concerns so Astropy can start benefitting from the great work being done here.
ping @mhvk for this discussion.
Thanks for the detailed description, ideas and prototype @nstarman! It will take me a little while to write a more detailed reply, but here are a few initial thoughts:
but base numpy does not conform to the array API standard.
That will hopefully change in the coming year, see this message about a NumPy 2.0 proposal/roadmap.
In the meantime, https://github.com/data-apis/array-api-compat/ should hopefully be helpful for standard adoption purposes.
In keeping with the scope of this standard, I have not included array-creation functions like
ones,
Creation functions like ones are included in the array API standard (see https://data-apis.org/array-api/latest/API_specification/creation_functions.html). These functions don't take an array as input, which is something that __array_ufunc__/__array_function__ cannot handle unfortunately. That is the main limitation of those protocols, and why several attempts at improvements to them have been started. What ended up in the array API standard is basically derived from NEP 37.
and a run-time checkable set of Protocols: in particular
ArrayAPIConformantwhich makes it easy to check that an object conforms to the array-API.
That sounds cool. It may address gh-229.
I agree with your general point that a wrapper library like this could be helpful for your use-case.
I do have one technical comment on your implementation, however. Based on my experience with writing array-api-compat, this approach:
def cos(x: ArrayAPIConformant, /) -> ArrayAPIConformant:
return get_namespace(x).cos(x)
is not a good one. The problem is that some functions do not accept an array object from which the array namespace can be deduced (the array creation functions are a great example of this, but there are others too, like the data type functions which accept dtypes). Furthermore, it leads to inherent ambiguity if a user decides to mix array objects from multiple libraries, which is not supported by the standard.
Instead, what you should do is specify the array library, which can be inferred from an array object or could just be the module object itself. And then from that return the entire array API compatible namespace. In the compat library I do something along these lines with a decorator to allow code sharing between the NumPy and CuPy compat layers, which are almost identical (see the @get_xp decorator at https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/_internal.py and https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/common/_aliases.py and https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/numpy/_aliases.py). One could also use a bit more metaprogramming to reduce the amount of typing.
Thanks for the quick responses @rgommers and @asmeurer
That will hopefully change in the coming year, see this message about a NumPy 2.0 proposal/roadmap.
Exciting!
In keeping with the scope of this standard, I have not included array-creation functions like ones,
Creation functions like ones are included in the array API standard (see https://data-apis.org/array-api/latest/API_specification/creation_functions.html).
Apologies for the confusion, what I meant was since ones cannot determine a namespace to which to dispatch, I didn't include it in the example library.
That sounds cool. It may address https://github.com/data-apis/array-api/issues/229.
I can submit a PR for this, but this library would have to have an installable package for the Protocols to be useful...
The problem is that some functions do not accept an array object from which the array namespace can be deduced (the array creation functions are a great example of this, but there are others too, like the data type functions which accept dtypes)
The data type functions are a good example! A point of curiosity, is there a reason why dtypes don't have the same __array_namespace__ method, which would break this ambiguity?
Furthermore, it leads to inherent ambiguity if a user decides to mix array objects from multiple libraries, which is not supported by the standard.
I think this is less of an issue. I modeled my get_namespace off the one suggested in the numpy docs
def get_namespace(*xs: Any, api_version: str | None = None) -> ArrayAPINamespace:
# `xs` contains one or more arrays.
namespaces = {x.__array_namespace__(api_version=api_version) for x in xs if isinstance(x, ArrayAPIConformant)}
if not namespaces:
raise ValueError("Unrecognized array input")
elif len(namespaces) != 1:
raise ValueError(f"Multiple namespaces for array inputs: {namespaces}")
return namespaces.pop()
It will error if passed objects from multiple libraries.
While definitely a limitation, having dispatching functions will still help with the following example:
>>> import array_api as ap
>>> import numpy.array_api as np
>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> ap.cos(angle)
even if not for
>>> ap.subtract(x1, angle)
The data type functions are a good example! A point of curiosity, is there a reason why dtypes don't have the same array_namespace method, which would break this ambiguity?
There's also asarray(), which accepts built-in types. asarray has to know a priori which array library it corresponds to.
It will error if passed objects from multiple libraries.
But the point is that in calling it in every function, you cannot control one array being passed to one function and another being passed to another.
It's better to move the get_namespace() call as high up as possible, ideally just one level below the user, so that the first thing you do is get a namespace, and after that everything just operates on that one namespace.
It's better to move the get_namespace() call as high up as possible, ideally just one level below the user, so that the first thing you do is get a namespace, and after that everything just operates on that one namespace.
Thanks for the example codes in the combat library! If I understand everything, each compat library is still specific to a non ArrayAPI library, e.g. one for numpy, another for cupy. A single dispatching library is still useful. Would it be in scope to have a generic version, e.g. array_api_compat.common that will accept ANY array-API compatible object and dispatch to the correct library? It might even accept ndarray etc and dispatch to array_api_compat.numpy.
@rgommers,
and a run-time checkable set of Protocols: in particularArrayAPIConformant which makes it easy to check that an object conforms to the array-API.
That sounds cool. It may address https://github.com/data-apis/array-api/issues/229.
I can submit a PR for this, but this library would have to have an installable package for the Protocols to be useful. I'm happy to contribute to that as well, if this is in the project's scope.
Would it be in scope to have a generic version, e.g. array_api_compat.common that will accept ANY array-API compatible object and dispatch to the correct library? It might even accept ndarray etc and dispatch to array_api_compat.numpy.
I think so. The array_api_compat.get_namespace already dispatches to either the corresponding compat namespace or library. I'm not sure if this works out-of-the-box with your Quantity use-case or if we need to adjust it. Feel free to open an issue on that repo.
Thanks for all your work and thought here.
First of, heads up that as of last week we now store the Python stubs of all versions in the repo's HEAD at src/array_api_stubs, where each sub-folder represents a version of the spec, where all work will be done in src/array_api_stubs/draft/ going forward. As you can see, this is actually organised like a package, but (right now) only for the purposes of reading stub signatures/docs in tooling (i.e.array-api-tests)—so not for say type hinting or dispatching. Generally we still need to mull over whether we want this as a publicly-advertised package.
Your main interest here looks to be dispatching, which I can't say I know how to feel about myself right now heh, but I was interested in a type hinting package a ways back (relevant discussion). Your repo demonstrates that having a Protocol for array and dtype objects (which can be used as type hints in the stub signatures) is a rather trivial change for the current stubs, so I feel like we should just implement that for this repo, irrespective if we want a first-party type hinting package (or dispatching).
If you'd like to take a stab at a PR converting our array_api_stubs._draft.array_object._array class into a Protocol, it will help see how consortium members feel about just having a first-party package containing an array protocol, which could help discussion on a dispatching library move along. Keep in mind:
- Only modify
src/array_api_stubs/_draft/files. We can always backport things later. - As you've seen we don't have an equivalent
_dtypeclass, only an__eq__function defined indata_types.py, but it did feel like something we should have so feel free to implement a_dtypeclass+protocol likewise. - ~We wouldn't want these protocols to be runtime-checkable.~
- We need to be considerate of what gets autodoc'd in
spec/draft/so the docs still render docstrings where we expect them to. I'd happily help with that stuff on review tho.
Regarding static typing, the key thing here is the array protocol I think. I'll sketch a very brief version of what you (@nstarman) implemented here:
@runtime_checkable
class ArrayAPIConformant(Protocol):
"""Runtime checkable protocol for conformance with the array API."""
@property
def dtype(self) -> DTypeConformant:
...
@property
def device(self) -> DeviceConformant:
...
@property
def mT(self: Self) -> Self:
...
@property
def ndim(self) -> int:
...
A few thoughts:
- This
ArrayAPIConformantprotocol seems good to go. I'd just name it plainArray(orarray), like is already used for annotations in this repo as well as in jaxtyping.- Should we add it inside this repo, as well as figure out how best to ship it in a separate package and/or make it work with
jaxtyping?
- Should we add it inside this repo, as well as figure out how best to ship it in a separate package and/or make it work with
- Dealing with dtype and/or dimensionality annotation inside this protocol seems useful and should be considered, but can possibly be done as a second step. I'll note that @honno sketched
Array[Bool], jaxtyping does the opposite (Float[Array]- this seems wrong conceptually), and NumPy supportsNDArray[np.float64](see the docs).- For the numpy implementation, see 2289292d/numpy/_typing/_generic_alias.py
- I like the examples given in the issue description here (e.g.,
ArrayAPIConformant | float) - noArrayLikeor some such thing. DTypeConformantdoesn't seem viable (or useful) to define, as discussed in https://github.com/data-apis/consortium-feedback/discussions/8#discussioncomment-1705780.- Same for
DeviceConformat- it's only an arbitrary object with an__eq__method, so might as well useAny.
So perhaps step 1 here is to open a PR to add an Array protocol to this repo? And step 2 is to extend it to Array[dtype-specifier]?
Dealing with dtype and/or dimensionality annotation inside this protocol seems useful and should be considered, but can possibly be done as a second step.
I'm just curious, but if you start to do things in later steps, won't this cause breaks in instance checks because the new protocol won't match classes that implemented the old protocol?
I know you're avoiding have a root package (although I've read some comments saying that it's still an open question), but the advantage to having a base class is that you can change things, and publish a new version of root package. Client libraries can depend on it, and there's guaranteed consistency. (And if two client libraries require different versions of the root package, that's a very helpful error suggesting that the interoperation you're expecting may not work, right?)
I know there's a lot of effort going into careful design and it looks like everything might come out perfect and so there may never need to be a new (incompatible) version of protocols, but I thought I'd mention it 😄
If you'd like to take a stab at a PR converting our
array_api_stubs._draft.array_object._arrayclass into aProtocol, it will help see how consortium members feel about just having a first-party package containing an array protocol, which could help discussion on a dispatching library move along.
So perhaps step 1 here is to open a PR to add an Array protocol to this repo? And step 2 is to extend it to Array[dtype-specifier]?
Sounds great! I think I'll have some time next week.
I'm just curious, but if you start to do things in later steps, won't this cause breaks in instance checks because the new protocol won't match classes that implemented the old protocol?
That should not be a problem. Array[float32] is more specific than plain Array, so if isinstance(x, Array) passes then it will still pass if the type annotation for x gets more specific later on.