pybind11
pybind11 copied to clipboard
Experimenting: `Annotated[Any, pybind11.CppType("cpp_namespace::UserType")]`
Description
For context see https://github.com/pybind/pybind11/pull/4876#issuecomment-1768750707 and surrounding comments.
Note: PR #5073 was split out from this PR and released with pybind11 v2.12.0
TODO:
- Wait for feedback: https://github.com/python/mypy/issues/16306
Suggested changelog entry:
@sizmailov The idea here is to not knowingly generate docstrings that stubgen cannot possibly process correctly.
stubgen 1.5.1 does not process the Annotated[Any, "cpp_namespace::UserType"] as desired.
What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?
Some new information:
I ran global testing with this PR. Through that I discovered 3 .pyi files that look better, for example (manipulated because it's closed source):
- def foo(self, *args, **kwargs) -> Any: ...
+ def foo(self, value) -> None: ...
That makes me think stubgen already has some functionality for handling Annotated[Any, "cpp_namespace::UserType"], but it's not applied recursively.
What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?
What would be the desired output? The following should be easily achievable:
def values(self) -> Iterator[Annotated[Any, "cpp_namespace::UserType"]]: ...
As a user of a python extension I would prefer to have this:
def values(self) -> Iterator[py_module.UserType]: ...
It's possible but requires information on mapping C++ types to Python, which is unavailable for stubgen. The only place that has this mapping is the binding code (e.g. pybind11_protobuf). Not sure how this can be extracted and passed to stubgen...
If I would go with Annotated I'd put some "label" around type string to leave no room for mistake (for 3rd-party tools that make use of annotations), e.g.:
def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...
def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...
I'll try that, thanks!
What I'm aiming for here is a last-ditch effort to produce a docstring that does not trip up stubgen. The current behavior is to simply let it fall on the floor.
As a user of a python extension I would prefer to have this:
def values(self) -> Iterator[py_module.UserType]: ...
That only works for py::class_- or py::enum_-wrapped types. The implementation is here (right above the code this PR is changing):
- https://github.com/pybind/pybind11/blob/7969049de4c11f1d4955ababd97f340d82d2bf5a/include/pybind11/pybind11.h#L484-L493
There is no existing mechanism for replacing the % placeholder (originating from here) at runtime with a fully-qualified Python name. Ideas that come to mind: maybe another registry; maybe a magic classmethod. But even if implemented, that can fail, too, so a robust fallback like what this PR is about will always have a value.
I decided to go with
def values(self) -> Iterator[Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]: ...
For the 5 .pyi files that I hit on via global testing before, inserting CppTypePybind11() does not make a difference. I like the extra clarity though, therefore I initiated another global testing run with this updated PR.
Another data point, result from globally testing this PR @ commit 272152e68b062749e892dbb362ced079064fd666, which inserts CppTypePybind11() like this:
Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]
There is only 1 new test failure, in tensorflow, a comparison with a golden file. That should be easy to deal with.
@sizmailov do have connections to the mypy stubgen owners?
I feel it would be ideal to get their input before we make a move here. Concrete questions:
- What's their overall high-level reaction to using
Annotated[Any, CppTypePybind11("cpp_namespace::UserType")](or similar)?
It seems like mypy does handle outer-level Annotated, but not nested ones, e.g.
Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]
- What are the chances that mypy will be changed to support such nested
Annotated?
@sizmailov do have connections to the mypy stubgen owners?
I went ahead and created https://github.com/python/mypy/issues/16306
type, exception, memoryview and bytearray also miss the handle_type_name.
Not sure about dtype, module_, generic_type and exception, since they are not in pytypes.h.
The rest seems to be properly handled.
grep -Poh 'class \w+ : public object ' include/pybind11/*.h
class dtype : public object
class module_ : public object
class generic_type : public object
class exception : public object
class iterator : public object
class type : public object
class iterable : public object
class str : public object
class bytes : public object
class bytearray : public object
class none : public object
class ellipsis : public object
class bool_ : public object
class int_ : public object
class float_ : public object
class weakref : public object
class slice : public object
class capsule : public object
class tuple : public object
class dict : public object
class sequence : public object
class list : public object
class anyset : public object
class function : public object
class staticmethod : public object
class buffer : public object
class memoryview : public object
type, ~exception,~memoryviewandbytearrayalso miss thehandle_type_name.
Done: 3c209443552e0be509592f2bab6f6221353e2e5d
Not sure about
dtype,
Done: 6a3a954a6b85d2f5dd770f7ab08d12bf1052a1b5
module_,
Done: 74817b7f9f384cd5fd15d5765d3c6594adac2e0a
since they are not in
pytypes.h.
There is already a precedent for that, the existing handle_type_name<> specializations in numpy.h:
-
https://github.com/pybind/pybind11/blob/3414c56b6c7c521d868c9a137ca2ace2e26b5b2e/include/pybind11/numpy.h#L48-L51
-
https://github.com/pybind/pybind11/blob/3414c56b6c7c521d868c9a137ca2ace2e26b5b2e/include/pybind11/numpy.h#L1971-L1975
generic_typeand
That's what the % placeholder was originally invented for (I guess), and is handled in the code I pointed out before:
- https://github.com/pybind/pybind11/blob/3414c56b6c7c521d868c9a137ca2ace2e26b5b2e/include/pybind11/pybind11.h#L484-L493
exception,
My current conclusion for this: 169b0e57ebda14c47f159823ea57c1971f713121
That class is a real oddball. The first thing that stands out, the type template parameter is not used. It's not really a type at all, but more of a factory for producing types derived from Exception. There is no constructor that takes a py::handle and checks if the object is actually derived from Exception. Therefore it cannot be an argument of a wrapped function:
m.def("pass_exception_void", [](const py::exception<void>&) {}); // Does not compile.
pytypes.h:459:36: error: could not convert ‘{h, pybind11::object::borrowed_t()}’ from ‘<brace-enclosed initializer list>’ to ‘pybind11::exception<void>’
459 | return {h, object::borrowed_t{}};
It can only appear as a return type, but without subversive steps only to return a newly created type, which fails at runtime. See added test.
I believe it's good that Annotated[...] appears in the error. I wouldn't want to suggest this is the right way to go, by working to produce something different.
@sizmailov Unless you have more suggestions, I'll stop working on this now, waiting for feedback from the stubgen side.
I think this PR only makes sense if there are cooperating changes in stubgen.
@sizmailov I'm interested in your opinion regarding d14d91e02afba4046c9a899ad47a20220ac1f1a5, could you please take a look?
The main idea is to remove the handle_type_name default implementation, to enforce that all types are explicitly specialized, to guard against oversights.
That generates the question: What do we actually want in the added specializations? — I'm not sure, any advice is appreciated.
Logging the results of a simple experiment: Google global testing with the current version of this PR:
(Internal test ID: OCL:593010609:BASE:593028799:1703228284011:fdcd5c9a)
22 build failures (22 unique)
$ grep 'cast.h:1208:32' * | cut -d"'" -f2 | sort | uniq -c
6 pybind11::detail::handle_type_name<jax::(anonymous namespace)::PjitFunction::pyobject>
6 pybind11::detail::handle_type_name<jax::PmapFunction::pyobject>
6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_ShardGroupType>>
6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_Type>>
45 pybind11::detail::handle_type_name<xla::PyArray>
It would probably take a few hours of effort to get those fixed (by adding handle_type_name specializations).
It might be too disruptive to release pybind11 without the handle_type_name default implementation, at least without a transition plan (e.g. first a release with opt-in, later a release changing to opt-out).