Improve performance of enum_ operators by going back to specific implementation
Description
This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().
NOTE: This PR was fully reviewed and is approved. However, because of the breaking change (see below), it will be merged only after the next patch release.
Behavior change: Multiple operator overloads
This PR changes how enum operators are implemented for convertible enums (enums that can be implicitly converted to their underlying integer type). Previously, operators like __eq__, __ne__, and arithmetic operators used a single generic implementation that handled both enum-to-enum and enum-to-scalar comparisons internally.
New implementation:
- For convertible enums, operators now use multiple type-specific overloads instead of a single generic implementation
- For example,
__eq__now has two separate overloads:__eq__(self: MyEnum, other: MyEnum, /) -> bool- for enum-to-enum comparison__eq__(self: MyEnum, other: int, /) -> bool- for enum-to-scalar comparison
- Similarly, arithmetic operators (
<,>,<=,>=,&,|,^, etc.) now have separate overloads for enum-to-enum and enum-to-scalar operations
Impact:
- Performance: This change eliminates Python object conversion overhead (
int()calls) by using direct C++ comparisons, resulting in the ~2x performance improvement shown in the benchmarks - Docstrings: When pybind11 generates docstrings for operators with multiple overloads, it lists all available signatures. This means the docstring format changes from showing a single signature to showing multiple signatures (one per overload)
- API compatibility: The runtime behavior remains the same - users can still compare enums to enums or enums to scalars exactly as before. Only the internal implementation and docstring format have changed
- Test updates: The enum operator docstring tests were updated to accommodate the new multi-overload docstring format by checking that the docstring starts with the operation name and contains the expected signature(s) anywhere in the docstring (not necessarily at the start)
Rationale for optimizing py::enum_
While py::enum_ was declared deprecated in pybind11 v3.0.0 in favor of py::native_enum, many existing codebases still rely on py::enum_ and cannot be migrated overnight. Large projects with extensive enum usage require careful planning and testing to transition to py::native_enum. This optimization provides immediate performance benefits for these existing codebases during their migration period, reducing the performance gap between py::enum_ and py::native_enum from approximately 18x slower to approximately 9x slower (based on the benchmark results below). For new code, py::native_enum remains the recommended choice as it offers the best performance and is the long-term supported API.
Benchmark results
using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):
Enum equality comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'
Times are nsec/loop
M4 Mac, before: 165, 167, 166, 164, 167 Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5
Enum ordering comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'
Mac, before: 170, 168, 168, 171, 168 Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3
(i.e., no difference between != and <)
Compare to performance of calling a method of a simple pybinded class:
Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'
Mac: 54.6, 54.6, 54.9, 55.3, 55.3
Also compare to performance using a py::native_enum:
Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'
Mac: 9.12, 9.13, 9.2, 9.21, 9.34
(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)
Code size:
- the marginal code cost of 1
py::arithmeticenum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of__text, plus some about 1000 bytes of__gcc_except_taband negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of__text, almost 2000 bytes of__gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost. - interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease:
__textfell by about 12500 bytes and__gcc_except_tabfell by a little over 2000 bytes, though there were negligible size increases in other sections. - The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.
Suggested changelog entry:
- Improve performance of operators for
py::enum_s, thoughpy::native_enumis still much faster.
test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.
Hi @swolchok I used Cursor to review this PR. It generated the four added commits, which are all of the relatively minor and polishing kind:
- 279c72a04f6cb3062ee862713005c4f37d4d47f3 Add static assertion for function_ref lifetime safety in call_impl
- a580ccebf390661f3fb725192827996d19c51dbb Add #undef cleanup for enum operator macros
- 0287ec6858f002143752711a9876154b7202fb7d Rename PYBIND11_THROW to PYBIND11_ENUM_OP_THROW_TYPE_ERROR
- 35b4b8f26d952602f544e1c34139629186046208 Clarify comments in function_ref.h
I'm waiting for the CI to see if they work on all platforms.
I also added two sections to the PR description, to explain that there is a behavior change, and why we're still optimizing a deprecated type.
Could you please review my commits and the new sections in the PR description?
Could you please review my commits and the new sections in the PR description?
LGTM
Could you please review my commits and the new sections in the PR description?
LGTM
Thanks! I think this is ready for merging, but ...
wrt my comment from yesterday:
Because of the behavior change, I think it's best to merge this PR only after the v3.0.2 patch release. I.e. I plan to merge this along with #5879 to start v3.1.0a0.