array-api Array namespace

Proof of concept.

Benefits:

Proper type hint of return type of __array_namespace__.
arange is now a statically understood signature, e.g. func: arange is meaningful.
can be easily extended to encompass all functions in this repo.

Fixes #267

Sep 12 '23 04:09 nstarman

The change in arange is because of callback protocols (https://peps.python.org/pep-0544/#callback-protocols), allowing for static comparison of the signature. As this relates to __array_namespace__, this means we can type hint ArrayAPINamespace.arange without having to copy the whole function signature, which is liable to become out of sync with the actual definition, e.g.


class ArrayAPINamespace(Protocol):
    arange: ArrangeCallable

Vs

class ArrayAPINamespace(Protocol):

    @staticmethod
    def arange(start: Union[int, float], /, stop: Optional[Union[int, float]] = None, step: Union[int, float] = 1, *, dtype: Optional[dtype] = None, device: Optional[device] = None) → array):
        ...

Per the docs rendering, class types can be manipulated. At https://github.com/cosmology-api/cosmology.api/tree/3f7ae746a166201298ebd8a786249b58aefdfe3d/docs/_ext we have some deep foo doing signature manipulations (@ntessore wrote this). So it's doable to have arange be a Protocol but be rendered as a function.

Also, callback protocols means the following is possible

>>> from numpy import arange
>>> from data_apis.array import arange as ArangeCallable

>>> isinstance(arange, ArangeCallable)
True

>>> def mycustomarange(...<correct signature>): ...
>>> isinstance(mycustomarange, ArangeCallable)
>>> True

Sep 12 '23 12:09 nstarman

Hmm, I am not knowledgeable enough about the latest in static typing to understand the callback rationale here. I do see that in NumPy arange is typed as a regular function with overloads.

@BvB93 do you have an opinion about typing arange this way?

Sep 13 '23 10:09 rgommers

It's (unfortunately) more contrived than ideal, but the typing itself is perfectly solid.

Just to give a bit more background: It's long been possible to create an object (well... a declaration thereof) using a type, e.g. foo: Callable[..., int] = blablabla. Unfortunatently it's not possible to go the other way around and create a type from an existing object. Without something like a python equivalent of Typescript's typeof operator (or some other way of reusing def statements for annotating namespaces), we're stuck with __call__-based protocol approach as used in this PR.

Sep 13 '23 10:09 BvB93

Thanks! And to confirm if I understood it right: we're going to need this method for all functions that don't take an array as an input (mostly array creation functions), and not for anything else, right? Or would we have to change all functions to classes with __call__ methods?

Sep 13 '23 10:09 rgommers

Or would we have to change all functions to classes with call methods?

The latter I'm afraid, or at least for all functions that you'd like to include in the array namespace protocol. At best you might be able to reuse a single protocol multiple times for representing for different functions with identical signatures, but I imagine that this could cause issues with docstrings and such.

Sep 13 '23 12:09 BvB93

Hmm. I wonder if it would be feasible to codegen type stubs here? If for every function def func(...) in a .py file we'd generate a corresponding entry class func: def __call__(self, ...) in a .pyi file, we could keep things normal in .py files while still making Mypy & co happy.

Sep 13 '23 12:09 rgommers

I would suggest to do the reverse, to code gen the function from the class and use the code-gen'ed function in the docs. The call-back protocols are useful for type-checking: both statically and at runtime (see https://github.com/data-apis/array-api/pull/685#issuecomment-1715634045 for examples). My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class). Avoiding magic and meta-coding in the actual library will help in achieving that goal. For communication purposes in the docs I agree that a function is the best representation, hence code-gen the function for the docs, and the __call__-protocol for the code.

Sep 13 '23 13:09 nstarman

My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class).

Once we have some parser for either automatically going from def to class or vice versa then it shouldn't really matter which direction we go (outside of whatever is considered more aestheticly pleasing during development), no? When actually building the package we could automatically generate perform this conversion either way.

Sep 13 '23 14:09 BvB93

I think it ends up mattering in a few cases:

installing a dev branch by installing the repo by pop install -e .
type checking with CI, e.g. mypy in pre-commit
using an IDE when developing the library

Composing that list, I guess it's the same for anyone using a compiled wheel, but for developers close to the code having to code-gen the type-correct classes from functions will be a pain and make the code effectively equivalent to being written in a compiled language a la https://xkcd.com/303.

Sep 14 '23 09:09 nstarman

Composing that list, I guess it's the same for anyone using a compiled wheel,

I'm not too worried about the users of the installed package, I'm thinking more from the point of view of working on the standard itself. In the end that is the primary purpose here: authoring a well-documented API standard. These things are functions in the standard after all, so having it all converted to classes with __call__ methods solely because static typing in Python is so limited makes it harder to work on the standard. I don't expect many contributors are experts in static typing rules, so it'll raise a few eyebrows I think when anyone sees these class definitions for the first time.

Also from a code reusability point of view: the current functions are idiomatic. You can copy them and fill in the body of each function in order to get a standard-compliant implementation.

installing a dev branch by installing the repo by pop install -e .

This one is easy to fix, as is an in-place build when using an IDE (those are the same effectively) - you can run the codegen as part of the install.

type checking with CI, e.g. mypy in pre-commit

I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi files in-tree though before running Mypy.

Sep 14 '23 16:09 rgommers

I second @rgommers opinion that we should continue authoring as functions, rather than as classes. Authoring as classes increases authoring complexity and just raises contribution barriers. I'd prefer to hide this complexity behind automation from functions to class conversion.

Sep 14 '23 18:09 kgryte

I did some trial and error yesterday and managed to cobble a script together for automatically carrying out this def ... -> class ... conversion. Turns out it's not all too complicated, though I did let black and isort handle the final formatting of the file as doing it in a more manual fashion with ast sounds like a nightmare: https://gist.github.com/BvB93/b659c9145cde08eb338053d7533306fb.

I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi files in-tree though before running Mypy.

The biggest obstacle would probably be a setuptools >=64 regression that breaks the type checking of editable installations. There are workarounds for this though: https://github.com/python/mypy/issues/13392#issuecomment-1594209324.

Sep 14 '23 20:09 BvB93

@BvB93 thanks for pointing out that editable install issue. I'll note that that will also affect numpy and any other users of meson-python (and scikit-build-core too). Import hooks are a must-have for out of tree builds. I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).

Sep 14 '23 21:09 rgommers

I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).

Right, I did a little bit more reading and some trial & error; I think we might be in the clear here: the relevant issue seems to only apply when (a package A is installed in editable mode and (b package B imports from package A, with B now unable to see A's annotations. So this could be a potential downstream annoyance for packages that want access to the array api's annotation, but it shouldn't affect the array api repo itself.

Sep 21 '23 15:09 BvB93

array-api array-api copied to clipboard

Array namespace

array-api
array-api copied to clipboard