array-api
array-api copied to clipboard
Array namespace
Proof of concept.
Benefits:
- Proper type hint of return type of
__array_namespace__
. -
arange
is now a statically understood signature, e.g.func: arange
is meaningful. - can be easily extended to encompass all functions in this repo.
Fixes #267
The change in arange
is because of callback protocols (https://peps.python.org/pep-0544/#callback-protocols), allowing for static comparison of the signature. As this relates to __array_namespace__
, this means we can type hint ArrayAPINamespace.arange
without having to copy the whole function signature, which is liable to become out of sync with the actual definition, e.g.
class ArrayAPINamespace(Protocol):
arange: ArrangeCallable
Vs
class ArrayAPINamespace(Protocol):
@staticmethod
def arange(start: Union[int, float], /, stop: Optional[Union[int, float]] = None, step: Union[int, float] = 1, *, dtype: Optional[dtype] = None, device: Optional[device] = None) → array):
...
Per the docs rendering, class types can be manipulated. At https://github.com/cosmology-api/cosmology.api/tree/3f7ae746a166201298ebd8a786249b58aefdfe3d/docs/_ext we have some deep foo doing signature manipulations (@ntessore wrote this). So it's doable to have arange
be a Protocol but be rendered as a function.
Also, callback protocols means the following is possible
>>> from numpy import arange
>>> from data_apis.array import arange as ArangeCallable
>>> isinstance(arange, ArangeCallable)
True
>>> def mycustomarange(...<correct signature>): ...
>>> isinstance(mycustomarange, ArangeCallable)
>>> True
Hmm, I am not knowledgeable enough about the latest in static typing to understand the callback rationale here. I do see that in NumPy arange
is typed as a regular function with overloads.
@BvB93 do you have an opinion about typing arange
this way?
It's (unfortunately) more contrived than ideal, but the typing itself is perfectly solid.
Just to give a bit more background:
It's long been possible to create an object (well... a declaration thereof) using a type, e.g. foo: Callable[..., int] = blablabla
. Unfortunatently it's not possible to go the other way around and create a type from an existing object. Without something like a python equivalent of Typescript's typeof
operator (or some other way of reusing def
statements for annotating namespaces), we're stuck with __call__
-based protocol approach as used in this PR.
Thanks! And to confirm if I understood it right: we're going to need this method for all functions that don't take an array as an input (mostly array creation functions), and not for anything else, right? Or would we have to change all functions to classes with __call__
methods?
Or would we have to change all functions to classes with call methods?
The latter I'm afraid, or at least for all functions that you'd like to include in the array namespace protocol. At best you might be able to reuse a single protocol multiple times for representing for different functions with identical signatures, but I imagine that this could cause issues with docstrings and such.
Hmm. I wonder if it would be feasible to codegen type stubs here? If for every function def func(...)
in a .py
file we'd generate a corresponding entry class func: def __call__(self, ...)
in a .pyi
file, we could keep things normal in .py
files while still making Mypy & co happy.
I would suggest to do the reverse, to code gen the function from the class and use the code-gen'ed function in the docs. The call-back protocols are useful for type-checking: both statically and at runtime (see https://github.com/data-apis/array-api/pull/685#issuecomment-1715634045 for examples). My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class). Avoiding magic and meta-coding in the actual library will help in achieving that goal. For communication purposes in the docs I agree that a function is the best representation, hence code-gen the function for the docs, and the __call__
-protocol for the code.
My hope is to have this library be installable and useful for type-checking purposes and as ABCs (protocols are ABCs when used as a base class).
Once we have some parser for either automatically going from def
to class
or vice versa then it shouldn't really matter which direction we go (outside of whatever is considered more aestheticly pleasing during development), no? When actually building the package we could automatically generate perform this conversion either way.
I think it ends up mattering in a few cases:
- installing a dev branch by installing the repo by
pop install -e .
- type checking with CI, e.g. mypy in pre-commit
- using an IDE when developing the library
Composing that list, I guess it's the same for anyone using a compiled wheel, but for developers close to the code having to code-gen the type-correct classes from functions will be a pain and make the code effectively equivalent to being written in a compiled language a la https://xkcd.com/303.
Composing that list, I guess it's the same for anyone using a compiled wheel,
I'm not too worried about the users of the installed package, I'm thinking more from the point of view of working on the standard itself. In the end that is the primary purpose here: authoring a well-documented API standard. These things are functions in the standard after all, so having it all converted to classes with __call__
methods solely because static typing in Python is so limited makes it harder to work on the standard. I don't expect many contributors are experts in static typing rules, so it'll raise a few eyebrows I think when anyone sees these class definitions for the first time.
Also from a code reusability point of view: the current functions are idiomatic. You can copy them and fill in the body of each function in order to get a standard-compliant implementation.
installing a dev branch by installing the repo by
pop install -e .
This one is easy to fix, as is an in-place build when using an IDE (those are the same effectively) - you can run the codegen as part of the install.
type checking with CI, e.g. mypy in pre-commit
I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi
files in-tree though before running Mypy.
I second @rgommers opinion that we should continue authoring as functions, rather than as classes. Authoring as classes increases authoring complexity and just raises contribution barriers. I'd prefer to hide this complexity behind automation from functions to class conversion.
I did some trial and error yesterday and managed to cobble a script together for automatically carrying out this def ...
-> class ...
conversion. Turns out it's not all too complicated, though I did let black
and isort
handle the final formatting of the file as doing it in a more manual fashion with ast
sounds like a nightmare: https://gist.github.com/BvB93/b659c9145cde08eb338053d7533306fb.
I think this one may be the only issue, because Mypy is bad at running against an installed package. It'd be an extra one line to install the .pyi files in-tree though before running Mypy.
The biggest obstacle would probably be a setuptools >=64 regression that breaks the type checking of editable installations. There are workarounds for this though: https://github.com/python/mypy/issues/13392#issuecomment-1594209324.
@BvB93 thanks for pointing out that editable install issue. I'll note that that will also affect numpy
and any other users of meson-python (and scikit-build-core too). Import hooks are a must-have for out of tree builds. I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).
I'm a little surprised that no one noticed this on SciPy or NumPy yet - IDE and static analysis tools seem to work okay there with editable installs which employ import hooks (unless I missed the bug reports).
Right, I did a little bit more reading and some trial & error; I think we might be in the clear here: the relevant issue seems to only apply when (a package A is installed in editable mode and (b package B imports from package A, with B now unable to see A's annotations. So this could be a potential downstream annoyance for packages that want access to the array api's annotation, but it shouldn't affect the array api repo itself.