pybind11
pybind11 copied to clipboard
[BUG]: iterators are needlessly slow / need an exception-free way of finishing.
Required prerequisites
- [X] Make sure you've read the documentation. Your issue may be addressed there.
- [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- [ ] Consider asking first in the Gitter chat room or in a Discussion.
What version (or hash if on master) of pybind11 are you using?
2.13.1
Problem description
Currently, the only way to have an iterator finish is by throwing py::stop_iteration{}. While this is "pythonic", it also incurs huge overhead, especially on short-lived iterators.
I was benchmarking an utility I wrote in C++ that iterates over a lot of files and parses text fields from each file. The files are organized in a specific directory structure that is reflected as three layers of directory iterators, leading to a total of ~25k iterators being created. The C++ program took 0.8s to execute, 0.47s of which were spent waiting on IO. The equivalent Python code exposed via pybind took 4s to execute.
When profiling, I saw that 15% of time was spent in exception handling (or up to 40% when using libunwind or llvm-libunwind, bumping execution time to 6s). This seems like a low-hanging fruit compared to all the other pybind-induced overhead.
Sadly, I couldn't come up with a good way how to solve this just yet. Perhaps pybind could add a std::optional-esque container that wraps the iterator return type + a tag on whether the iterator is at it's end?
I also found no way to signal the iterator end without going throw py::stop_iteration, if I missed something obvious please yell at me.
Reproducible example code
No response
Is this a regression? Put the last known working version here if it is.
Not a regression
I see the same issue for any exception and started a thread here: https://github.com/pybind/pybind11/discussions/5317