pybind11
pybind11 copied to clipboard
feat(types) Numpy.typing.NDArray
Description
Switches to using numpy.typing.NDArray for typing annotations over numpy.ndarray. This is because numpy.ndarray is implemented with the first argument for some future size annotation. See https://github.com/numpy/numpy/issues/16544. Since the first argument is for the size annotation the typing of the numpy.ndarray is basically not being used. I also modified the eigen matrix/tensor to numpy.typing.NDArray as well even though stubgen fails to generate stubs since numpy.typing.NDArray[numpy.float64[...]] is not a valid type.
def foo(x: numpy.typing.NDArray[numpy.float64]) -> None:
reveal_type(x) # ndarray[Any, dtype[floating[_64Bit]]
reveal_type(x.dtype) # dtype[float64]
def bar(x: numpy.ndarray[numpy.float64]) -> None:
reveal_type(x) # ndarray[float64, Any]
reveal_type(x.dtype) # Any
Suggested changelog entry:
Switched to `numpy.typing.NDArray`
I don't know enough about typing to meaningfully weigh the pros and cons of changing from ndarray to NDArray. @henryiii could you help with that?
Regarding the tuple[()] change, we have this already:
https://github.com/pybind/pybind11/blob/51c2aa16de5b50fe4be6a0016d6090d4a831899e/include/pybind11/typing.h#L138-L142
Is your change needed anyway? Could you send a separate, small PR for that, including a test that fails without your change?
NDArray is the public type for NDArrays, and hides the "size" related type (as described above). It also uses a different form for the DType. Just curious, what about changing it to numpy.ndarray[Any, numpy.float64]?
FYI, I get an error on your example:
$ bat tmp.py
───────┬─────────────────────────────────────────
│ File: tmp.py
───────┼─────────────────────────────────────────
1 │ import numpy
2 │ from typing import Any
3 │
4 │ v: numpy.typing.NDArray[numpy.float64]
5 │ reveal_type(v)
6 │ reveal_type(v.dtype)
7 │
8 │ q: numpy.ndarray[Any, numpy.float64]
9 │ reveal_type(q)
10 │ reveal_type(q.dtype)
11 │
12 │
───────┴─────────────────────────────────────────
$ pipx install mypy
$ pipx inject mypy numpy
$ mypy tmp.py
tmp.py:5: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy._typing._64Bit]]]"
tmp.py:6: note: Revealed type is "numpy.dtype[numpy.floating[numpy._typing._64Bit]]"
tmp.py:8: error: Type argument "floating[_64Bit]" of "ndarray" must be a subtype of "dtype[Any]" [type-var]
tmp.py:9: note: Revealed type is "numpy.ndarray[Any, numpy.floating[numpy._typing._64Bit]]"
tmp.py:10: note: Revealed type is "numpy.floating[numpy._typing._64Bit]"
Found 1 error in 1 file (checked 1 source file)
In my example I have numpy.ndarray[numpy.float64] which is what was currently being done to shown the error. You have numpy.ndarray[Any, numpy.float64] which is also incorrect. It would need to be numpy.ndarray[Any, numpy.dtype[numpy.float64]]] to create the proper type annotation without numpy.typing. If you would prefer numpy.ndarray[Any, numpy.dtype[numpy.float64]]] I can update it to be that.
The valid options are
numpy.ndarray[Any, numpy.dtype[numpy.float64]]]numpy.typing.NDArray[numpy.float64]
I think numpy.typing.NDArray is the cleanest way to resolve this issue.
I can't find any documentation for the shape, writeable and contiguous attributes in the type hint. Is this correct?
numpy.typing.NDArray[numpy.float32[5, 6], flags.writeable, flags.f_contiguous]
I don't think these are valid for numpy.ndarray either.
Needs update after #5486
@timohl I have tried some various things like to get io_name working with templated arguments. Is there something obvious I am missing?
PYBIND11_TYPE_CASTER(type,
io_name("numpy.typing.ArrayLike",
"numpy.typing.NDArray["
+ npy_format_descriptor<T>::name + "]"));
@timohl I have tried some various things like to get
io_nameworking with templated arguments. Is there something obvious I am missing?
Yes, but maybe not so obvious.
io_name only supports constexpr c-string arguments.
The type of npy_format_descriptor<T>::name is some instance of the template struct descr.
This disallows nesting of io_name which would break things.
Maybe something like this could work:npy_format_descriptor<T>::name.text
I have drafted a working version using io_name with c-strings here: #5502.
With this, I was able to build Open3D (with minor modification), run pybind-stubgen, and run mypy successfully on all example code without errors about numpy types from Eigen. 🚀
We should discuss what kind of numpy type we should use.
In #5502 I used something like this:
For args: typing.Annotated[numpy.typing.ArrayLike, numpy.float32, "[m, 1]", "flags.writeable", "flags.c_contiguous"]
For return: typing.Annotated[numpy.typing.NDArray[numpy.float32], "[m, 1]", "flags.writeable", "flags.c_contiguous"]
I think a combination of typing.Annotated, numpy.typing.ArrayLike and numpy.typing.NDArray would be best to keep all available information while providing valid types.
Open questions:
- Format of
typing.Annotatedarguments: I used strings to use"[m, 1]"as shape, since m is usually not defined. Other options might be to use something like nptyping or annotated-types but adding 3rd party dependencies should rather be an option in stubgen, not pybind itself in my opinion - Should all flags be shown in return type aswell or only in args? I think they are used to provide better error messages when a user passes an array that does not conform
- Where to use
ArrayLikeorNDArray? Right now, I just added it to all args, but I have not completely checked the code of type casters for every Eigen variant if they really accept for example lists as input.
@timohl I copied over your changes but, I seem to be getting some errors that have to with tools/FindCatch.cmake. To me they look unrelated but, maybe I am missing something. Any ideas?
Hmm it seems like maybe it just was a server issue.
Great to hear that it is working now!
One small style question @InvincibleRMC:
Any reason why you are using escaped quotation marks (\" instead of plain ") in multi-line strings?
To my understanding, there is no need to escape, but maybe it is preferable due to another reason?
Also, a note regarding the open question in my last comment:
Format of typing.Annotated arguments: I used strings to use "[m, 1]" as shape, since m is usually not defined. Other options might be to use something like nptyping or annotated-types but adding 3rd party dependencies should rather be an option in stubgen, not pybind itself in my opinion
My suggestion would be to use strings (as implemented in this PR) as long as there is no standard way of defining shapes/properties of numpy arrays. Right now, the point is to inform the user that the passed array does not conform to certain requirements. So, until numpy defines a way to specify those, strings should be able to inform without breaking type checking.
Here also an overview of several variants with mypy/pylance/ruff errors. TLDR: f1 (this PR) reports no errors for signature and call, and contains all available info.
import typing
from typing import Any, Literal, TypeVar
import numpy
# This PR (no errors):
def f1(
x: typing.Annotated[numpy.typing.ArrayLike, numpy.float64, "[m, 1]", "flags.writeable"],
) -> typing.Annotated[numpy.typing.NDArray[numpy.float64], "[m, 1]"]: ...
# Call f1 (no errors):
f1(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))
# Current state
#
# Errors:
# PylancereportInvalidTypeArguments:
# Type "float64" cannot be assigned to type variable "_ShapeT_co@ndarray"
# Type "float64" is not assignable to upper bound "_Shape" for type variable "_ShapeT_co@ndarray"
# "float64" is not assignable to "tuple[int, ...]"
# Ruff(F821):
# Undefined name `m`
# Undefined name `flags`
# PylancereportUndefinedVariable:
# "m" is not defined
# "flags" is not defined
# PylancereportInvalidTypeArguments:
# Expected no type arguments for class "float64"
# mypy[type-arg]: "float64" expects no type arguments, but 2 given
# mypy[name-defined]: Name "flags" is not defined
# mypy[type-var]: Type argument "float64" of "ndarray" must be a subtype of "tuple[int, ...]"
# mypy[name-defined]: Name "m" is not defined [name-defined]
# mypy[valid-type]: Invalid type: try using Literal[1] instead?
def f2(
x: numpy.ndarray[numpy.float64[m, 1], flags.writeable],
) -> numpy.ndarray[numpy.float64[m, 1], flags.writeable]: ...
# Suggestion by henryiii
#
# Errors:
# PylancereportInvalidTypeArguments:
# Type "float64" cannot be assigned to type variable "_DType_co@ndarray"
# Type "float64" is not assignable to upper bound "dtype[Any]" for type variable "_DType_co@ndarray"
# "float64" is not assignable to "dtype[Any]"
# mypy[type-var]: Type argument "float64" of "ndarray" must be a subtype of "dtype[Any]"
def f3(x: numpy.ndarray[Any, numpy.float64]) -> numpy.ndarray[Any, numpy.float64]: ...
# Fix of f3 by gentlegiantJGC (No errors, but no shape info):
def f4(
x: numpy.ndarray[Any, numpy.dtype[numpy.float64]],
) -> numpy.ndarray[Any, numpy.dtype[numpy.float64]]: ...
# Call f4 (no errors):
f4(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))
# f4 with wrong shape:
#
# Errors:
# PylancereportGeneralTypeIssues:
# Expected class but received "tuple[Literal[3], Literal[3]]"
# mypy[syntax]:
# error: Syntax error in type annotation
# note: Suggestion: Use Tuple[T1, ..., Tn] instead of (T1, ..., Tn)
def f5(
x: numpy.ndarray[(3, 3), numpy.dtype[numpy.float64]],
) -> numpy.ndarray[(3, 3), numpy.dtype[numpy.float64]]: ...
# f4 with wrong shape:
#
# Errors:
# PylancereportGeneralTypeIssues:
# Expected class but received "Literal[3]"
# mypy[valid-type]: Invalid type: try using Literal[3] instead?
def f6(
x: numpy.ndarray[tuple[3, 3], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[3, 3], numpy.dtype[numpy.float64]]: ...
# f4 with correct shape (No errors, but only int possible, not `m`):
def f7(
x: numpy.ndarray[tuple[Literal[3], Literal[3]], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[Literal[3], Literal[3]], numpy.dtype[numpy.float64]]: ...
# Call f7:
#
# Errors:
# PylancereportArgumentType:
# Argument of type "NDArray[Any]" cannot be assigned to parameter "x" of type "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]" in function "f7"
# "ndarray[_Shape, dtype[Any]]" is not assignable to "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]"
# Type parameter "_ShapeT_co@ndarray" is covariant, but "_Shape" is not a subtype of "tuple[Literal[3], Literal[3]]"
# "tuple[int, ...]" is not assignable to "tuple[Literal[3], Literal[3]]"
# Tuple size mismatch; expected 2 but received indeterminate
# mypy[arg-type]: Argument 1 to "f7" has incompatible type "ndarray[tuple[int, ...], dtype[float64]]"; expected "ndarray[tuple[Literal[3], Literal[3]], dtype[float64]]"
f7(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))
# f4 with correct shape + m as TypeVar (No errors):
# Two int in TypeVar, otherwide PylancereportGeneralTypeIssues: TypeVar must have at least two constrained types
m = TypeVar("m", int, int)
def f8(
x: numpy.ndarray[tuple[m, Literal[3]], numpy.dtype[numpy.float64]],
) -> numpy.ndarray[tuple[m, Literal[3]], numpy.dtype[numpy.float64]]: ...
# Call f8:
#
# Errors:
# PylancereportArgumentType:
# Argument of type "NDArray[Any]" cannot be assigned to parameter "x" of type "ndarray[tuple[m@f8, Literal[3]], dtype[float64]]" in function "f8"
# "ndarray[_Shape, dtype[Any]]" is not assignable to "ndarray[tuple[m@f8, Literal[3]], dtype[float64]]"
# Type parameter "_ShapeT_co@ndarray" is covariant, but "_Shape" is not a subtype of "tuple[m@f8, Literal[3]]"
# "tuple[int, ...]" is not assignable to "tuple[m@f8, Literal[3]]"
# Tuple size mismatch; expected 2 but received indeterminate
# mypy[arg-type]: Argument 1 to "f8" has incompatible type "ndarray[tuple[int, ...], dtype[float64]]"; expected "ndarray[tuple[int, Literal[3]], dtype[float64]]"
f8(numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))
One small style question @InvincibleRMC: Any reason why you are using escaped quotation marks (
\"instead of plain") in multi-line strings? To my understanding, there is no need to escape, but maybe it is preferable due to another reason?
I will be honest did not know that is how it worked in multi-line strings. will change to the plain " since that seems cleaner.
What's the status of this PR?
Tests pass. I quickly scanned through and purely technically this looks good to me.
@timohl if this looks good to you, could you please formally approve?
Each type caster still has to be checked if it actually supports ArrayLike behaviour (e.g., passing a list).
I just changed each occurrence of np.ndarray without actually checking if the caster code matches the new signature.
Shortly looking over the diff, I see the signature of the following casters being changed by this PR:
- [x]
numpy: array_t - [x]
matrix:dense - [x]
matrix:map - [x]
matrix:ref - [x]
matrix:other - [x]
tensor:normal - [x]
tensor:map
I will go through those and also write short tests for that. Should be done this evening.
I have added tests and some fixes in #5502.
The casters of Eigen::TensorMap and Eigen::Ref do not accept ArrayLike types (like list).
In order to reflect this in the signature, I added return_descr() forcing the use of NDArray.
@InvincibleRMC, Since I do not have write permission to your fork, could you merge the last commit 0f4cadd into this PR branch? You could check out and cherry-pick the commit, like this:
gh pr checkout 5502
git checkout numpy.typing-and-empty-tuple
git cherry-pick 0f4cadd
After this, I think this looks good, but it would be great if you could check my changes @InvincibleRMC.
All your changes look good to me @timohl .
Awesome, thanks!
Is the PR description still up-to-date?
I glanced through again and it looks good to me, but I want to run the changes in the .py tests by a typing expert for a quick independent opinion, hence my question about the PR description.
Here is ChatGPT's opinion:
https://chatgpt.com/share/67b008af-e758-8008-973b-ac4345b44a8c
Looks good to ChatGPT :-)
Is the PR description still up-to-date?
I updated the description to mention the use of numpy.typing.ArrayLike as well.
It would be good to add a small example to the description like this:
Previously, numpy arrays and eigen types would display in type hints like this:
numpy.ndarray[numpy.float64[m, 1], flags.writeable]
In order to have valid type hints and reflect that pybind is able to cast, e.g., lists to arrays automatically, this PR changes type hints to:
For args: typing.Annotated[numpy.typing.ArrayLike, numpy.float64, "[m, 1]", "flags.writable"]
For return: typing.Annotated[numpy.typing.NDArray[numpy.float64], "[m, 1]", "flags.writable"]
Great, thanks @rwgk for your continued efforts in reviewing!
This should be the last PR required for me to introduce typing to Open3D.
Are there any plans when the next version bump will be, so I can pin the Open3D build to a specific version (and not a commit hash)? v2.13.6 was on Sep 14, 2024, but I see you are working on merging the smart_holder branch for v3.0.0, so I should probably wait for that?
Great, thanks @rwgk for your continued efforts in reviewing!
This should be the last PR required for me to introduce typing to Open3D.
Are there any plans when the next version bump will be, so I can pin the Open3D build to a specific version (and not a commit hash)? v2.13.6 was on Sep 14, 2024, but I see you are working on merging the smart_holder branch for v3.0.0, so I should probably wait for that?
Yep. I merged #5530 a couple hours ago. I'll be looking for reviewers for doc updates related to merging the smart_holder functionality ;-)