array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Add an Array Protocol & improve static typing support

Open nstarman opened this issue 2 years ago • 16 comments

Continuing from https://github.com/data-apis/array-api/issues/584, I've done a quick refactor of the array (and dtype) objects to make them typing Protocols. @rgommers, I agree that making Array generic wrt the dtype is good for a followup.

It's not perfect, but mypy doesn't complain a whole lot, so I think this is a good start. I'm happy to make refinements.

nstarman avatar Feb 05 '23 23:02 nstarman

We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.

I was thinking the same thing last evening about moving this to a followup PR.

On namespace type hints (e.g. _CreatoinFuncs, ArrayAPINamespace): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead?

Maybe? I haven't experimented yet but perhaps mypy will be satisfied with:

class ArrayAPINamespace(Protocol):

    zeros_like = staticmethod(creation_funcs.zeros_like)

Also for a future PR: zeros_like could be promoted to a Protocol,


class Array(Protocol):
    def __array_namespace__(self, ...) -> ArrayAPINamespace[Self]: ...


class ArrayAPINamespace(Protocol[Array]):

    zeros_like = staticmethod(zeros_like)


class zeros_like(Protocol[Array]):
    def __call__(self, x: Array, /, *, dtype: Optional[dtype] = None, device: Optional[device] = None) -> Array: ...

The advantage of this is that functions can be checked to see if their signature is compatible with zeros_like.

nstarman avatar Feb 06 '23 18:02 nstarman

@BvB93 if you have any feedback on this PR, that would be great to see.

rgommers avatar Feb 21 '23 21:02 rgommers

@nstarman there are 3 errors left when running mypy array_api_stubs/_draft/array_object.py (from lots more on main). There are more errors in other files - but if we are going to maintain static typing support in this repo, adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

rgommers avatar Feb 21 '23 21:02 rgommers

One thing that isn't nice is that Self now shows up instead of array in the signatures of the html docs.

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

That sounds like a good idea. I can try adding to the CI and create an ignore file that can later be whittled down as the package is made more type friendly.

nstarman avatar Feb 21 '23 22:02 nstarman

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

That works. It's not clickable - there is no array object in the standard on purpose, because in the end we don't care whether it's named ndarray, Tensor, or something else.

rgommers avatar Feb 21 '23 22:02 rgommers

@rgommers Sorry for the long interlude. I can't seem to resolve 3 linking errors in the docs https://app.circleci.com/pipelines/github/data-apis/array-api/1088/workflows/ed2dfa75-c3c7-4845-8f28-e01fe35d0c8d/jobs/996?invite=true#step-103-1646

nstarman avatar Aug 12 '23 21:08 nstarman

Sphinx is trying to create links for the type hints. I think to fix that you need to update the list(s) here https://github.com/data-apis/array-api/blob/main/src/_array_api_conf.py#L52-L77

asmeurer avatar Aug 14 '23 19:08 asmeurer

I can squash commits if you don't use Squash & Merge (assuming this is approved, of course :) ). Thanks @asmeurer for the pointer.

nstarman avatar Aug 14 '23 20:08 nstarman

I think pre-commit is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔

pre-commit.ci can be turned on, sans additional GH workflows. black, and ruff would be wonderful as well!

nstarman avatar Aug 18 '23 14:08 nstarman

Looks good on my end!

Thanks @honno for the reviews and discussion.

nstarman avatar Aug 19 '23 20:08 nstarman

Any reason to change array to Array? I think we might prefer to keep it as array like it is now.

asmeurer avatar Aug 30 '23 20:08 asmeurer

The class would have to be called array, which is problematic re PEP-8 and also conflicts name-wise with the array TypeVar. The latter can be changed, but for the former isn't it desirable to refer to array instances of some class Array?

nstarman avatar Aug 30 '23 22:08 nstarman

The class name is specifically not part of the standard. I always felt that array made that a little clearer than Array, but maybe that's just me.

By the way, the protocol isn't going to require the class to be called Array is it? If it does, that's a problem.

asmeurer avatar Aug 30 '23 22:08 asmeurer

By the way, the protocol isn't going to require the class to be called Array is it? If it does, that's a problem.

Not specifically. The protocol is a class and may be named anything. If it's named the same things as another variable, e.g. the array = TypeVar("array", bound=Array) then that can be problematic.

As a more zoomed out view. The goal of this and subsequent PRs is to enable the spec to also be an installable package that can be used for run-time checks of adherence to this spec as well as for general type hinting.

>>> from data_apis.array import Array  # this spec
>>> from numpy import array, ndarray  # post v2.0

>>> issubclass(ndarray, Array)
True

>>> isinstance(array(1), Array)
True

To be clear, Array cannot be instantiated since it is a protocol without any implementation. Its use is entirely for structural subtyping (e.g. type hinting) and run-time checks. At some level it's important that the spec adhere to Python standards, e.g. naming conventions.

The class name is specifically not part of the standard.

I'm not sure I understand then what's the problem with calling it Array?

nstarman avatar Aug 31 '23 17:08 nstarman

LGTM, though it might be good to verify if the currently available downstream implementations properly satisfy subtype the protocol as this could potentially reveal some typing-related corner cases that we've missed thus far. Could be something for a follow up though.

BvB93 avatar Sep 12 '23 12:09 BvB93

It would be great to see this released. It looks very developed to me, and would be useful for us within xarray.

TomNicholas avatar Jan 29 '24 20:01 TomNicholas