pybind11 Improve performance of enum_ operators by going back to specific implementation

Description

This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().

NOTE: This PR was fully reviewed and is approved. However, because of the breaking change (see below), it will be merged only after the next patch release.

Behavior change: Multiple operator overloads

This PR changes how enum operators are implemented for convertible enums (enums that can be implicitly converted to their underlying integer type). Previously, operators like __eq__, __ne__, and arithmetic operators used a single generic implementation that handled both enum-to-enum and enum-to-scalar comparisons internally.

New implementation:

For convertible enums, operators now use multiple type-specific overloads instead of a single generic implementation
For example, __eq__ now has two separate overloads:
- __eq__(self: MyEnum, other: MyEnum, /) -> bool - for enum-to-enum comparison
- __eq__(self: MyEnum, other: int, /) -> bool - for enum-to-scalar comparison
Similarly, arithmetic operators (<, >, <=, >=, &, |, ^, etc.) now have separate overloads for enum-to-enum and enum-to-scalar operations

Impact:

Performance: This change eliminates Python object conversion overhead (int() calls) by using direct C++ comparisons, resulting in the ~2x performance improvement shown in the benchmarks
Docstrings: When pybind11 generates docstrings for operators with multiple overloads, it lists all available signatures. This means the docstring format changes from showing a single signature to showing multiple signatures (one per overload)
API compatibility: The runtime behavior remains the same - users can still compare enums to enums or enums to scalars exactly as before. Only the internal implementation and docstring format have changed
Test updates: The enum operator docstring tests were updated to accommodate the new multi-overload docstring format by checking that the docstring starts with the operation name and contains the expected signature(s) anywhere in the docstring (not necessarily at the start)

Rationale for optimizing `py::enum_`

While py::enum_ was declared deprecated in pybind11 v3.0.0 in favor of py::native_enum, many existing codebases still rely on py::enum_ and cannot be migrated overnight. Large projects with extensive enum usage require careful planning and testing to transition to py::native_enum. This optimization provides immediate performance benefits for these existing codebases during their migration period, reducing the performance gap between py::enum_ and py::native_enum from approximately 18x slower to approximately 9x slower (based on the benchmark results below). For new code, py::native_enum remains the recommended choice as it offers the best performance and is the long-term supported API.

Benchmark results

using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):

Enum equality comparison Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x' Times are nsec/loop

M4 Mac, before: 165, 167, 166, 164, 167 Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5

Enum ordering comparison Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'

Mac, before: 170, 168, 168, 171, 168 Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3

(i.e., no difference between != and <)

Compare to performance of calling a method of a simple pybinded class: Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'

Mac: 54.6, 54.6, 54.9, 55.3, 55.3

Also compare to performance using a py::native_enum: Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'

Mac: 9.12, 9.13, 9.2, 9.21, 9.34

(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)

Code size:

the marginal code cost of 1 py::arithmetic enum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of __text, plus some about 1000 bytes of __gcc_except_tab and negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of __text, almost 2000 bytes of __gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.
interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease: __text fell by about 12500 bytes and __gcc_except_tab fell by a little over 2000 bytes, though there were negligible size increases in other sections.
The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.

Suggested changelog entry:

Improve performance of operators for py::enum_s, though py::native_enum is still much faster.

Oct 31 '25 20:10 swolchok

test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.

Nov 03 '25 23:11 swolchok

Hi @swolchok I used Cursor to review this PR. It generated the four added commits, which are all of the relatively minor and polishing kind:

279c72a04f6cb3062ee862713005c4f37d4d47f3 Add static assertion for function_ref lifetime safety in call_impl
a580ccebf390661f3fb725192827996d19c51dbb Add #undef cleanup for enum operator macros
0287ec6858f002143752711a9876154b7202fb7d Rename PYBIND11_THROW to PYBIND11_ENUM_OP_THROW_TYPE_ERROR
35b4b8f26d952602f544e1c34139629186046208 Clarify comments in function_ref.h

I'm waiting for the CI to see if they work on all platforms.

I also added two sections to the PR description, to explain that there is a behavior change, and why we're still optimizing a deprecated type.

Could you please review my commits and the new sections in the PR description?

Nov 11 '25 20:11 rwgk

Could you please review my commits and the new sections in the PR description?

LGTM

Nov 11 '25 20:11 swolchok

Could you please review my commits and the new sections in the PR description?

LGTM

Thanks! I think this is ready for merging, but ...

wrt my comment from yesterday:

Because of the behavior change, I think it's best to merge this PR only after the v3.0.2 patch release. I.e. I plan to merge this along with #5879 to start v3.1.0a0.

Nov 11 '25 22:11 rwgk

Improve performance of enum_ operators by going back to specific implementation

Description

Behavior change: Multiple operator overloads

Rationale for optimizing py::enum_

Benchmark results

Code size:

Suggested changelog entry:

Rationale for optimizing `py::enum_`