array-api
array-api copied to clipboard
Add an Array Protocol & improve static typing support
Continuing from https://github.com/data-apis/array-api/issues/584, I've done a quick refactor of the array (and dtype) objects to make them typing Protocols. @rgommers, I agree that making Array generic wrt the dtype is good for a followup.
It's not perfect, but mypy doesn't complain a whole lot, so I think this is a good start. I'm happy to make refinements.
We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.
I was thinking the same thing last evening about moving this to a followup PR.
On namespace type hints (e.g.
_CreatoinFuncs
,ArrayAPINamespace
): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead?
Maybe? I haven't experimented yet but perhaps mypy will be satisfied with:
class ArrayAPINamespace(Protocol):
zeros_like = staticmethod(creation_funcs.zeros_like)
Also for a future PR: zeros_like
could be promoted to a Protocol
,
class Array(Protocol):
def __array_namespace__(self, ...) -> ArrayAPINamespace[Self]: ...
class ArrayAPINamespace(Protocol[Array]):
zeros_like = staticmethod(zeros_like)
class zeros_like(Protocol[Array]):
def __call__(self, x: Array, /, *, dtype: Optional[dtype] = None, device: Optional[device] = None) -> Array: ...
The advantage of this is that functions can be checked to see if their signature is compatible with zeros_like
.
@BvB93 if you have any feedback on this PR, that would be great to see.
@nstarman there are 3 errors left when running mypy array_api_stubs/_draft/array_object.py
(from lots more on main
). There are more errors in other files - but if we are going to maintain static typing support in this repo, adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?
One thing that isn't nice is that
Self
now shows up instead ofarray
in the signatures of the html docs.
If the array
type is just text in the html docs and not 'clickable', then we can rename Self
to array
.
adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?
That sounds like a good idea. I can try adding to the CI and create an ignore file that can later be whittled down as the package is made more type friendly.
If the
array
type is just text in the html docs and not 'clickable', then we can renameSelf
toarray
.
That works. It's not clickable - there is no array object in the standard on purpose, because in the end we don't care whether it's named ndarray
, Tensor
, or something else.
@rgommers Sorry for the long interlude. I can't seem to resolve 3 linking errors in the docs https://app.circleci.com/pipelines/github/data-apis/array-api/1088/workflows/ed2dfa75-c3c7-4845-8f28-e01fe35d0c8d/jobs/996?invite=true#step-103-1646
Sphinx is trying to create links for the type hints. I think to fix that you need to update the list(s) here https://github.com/data-apis/array-api/blob/main/src/_array_api_conf.py#L52-L77
I can squash commits if you don't use Squash & Merge (assuming this is approved, of course :) ). Thanks @asmeurer for the pointer.
I think
pre-commit
is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔
pre-commit.ci
can be turned on, sans additional GH workflows.
black
, and ruff
would be wonderful as well!
Looks good on my end!
Thanks @honno for the reviews and discussion.
Any reason to change array
to Array
? I think we might prefer to keep it as array
like it is now.
The class would have to be called array
, which is problematic re PEP-8 and also conflicts name-wise with the array TypeVar
. The latter can be changed, but for the former isn't it desirable to refer to array
instances of some class Array
?
The class name is specifically not part of the standard. I always felt that array
made that a little clearer than Array
, but maybe that's just me.
By the way, the protocol isn't going to require the class to be called Array
is it? If it does, that's a problem.
By the way, the protocol isn't going to require the class to be called Array is it? If it does, that's a problem.
Not specifically. The protocol is a class and may be named anything. If it's named the same things as another variable, e.g. the array = TypeVar("array", bound=Array)
then that can be problematic.
As a more zoomed out view. The goal of this and subsequent PRs is to enable the spec to also be an installable package that can be used for run-time checks of adherence to this spec as well as for general type hinting.
>>> from data_apis.array import Array # this spec
>>> from numpy import array, ndarray # post v2.0
>>> issubclass(ndarray, Array)
True
>>> isinstance(array(1), Array)
True
To be clear, Array
cannot be instantiated since it is a protocol without any implementation. Its use is entirely for structural subtyping (e.g. type hinting) and run-time checks.
At some level it's important that the spec adhere to Python standards, e.g. naming conventions.
The class name is specifically not part of the standard.
I'm not sure I understand then what's the problem with calling it Array
?
LGTM, though it might be good to verify if the currently available downstream implementations properly satisfy subtype the protocol as this could potentially reveal some typing-related corner cases that we've missed thus far. Could be something for a follow up though.
It would be great to see this released. It looks very developed to me, and would be useful for us within xarray.