[Experiment] Switching from pybind11 to nanobind for function call overhead improvements
Switching from pybind11 to nanobind offers some performance improvements with minimal code changes. Our new benchmarks are:
------------------------------------------------------------------------------------- benchmark: 3 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_can_ada_parse 38.3641 (1.0) 38.7200 (1.0) 38.5535 (1.0) 0.0861 (1.0) 38.5595 (1.0) 0.1098 (1.0) 9;0 25.9380 (1.0) 26 1
test_ada_python_parse 111.0045 (2.89) 111.3101 (2.87) 111.1474 (2.88) 0.1099 (1.28) 111.1436 (2.88) 0.1624 (1.48) 4;0 8.9971 (0.35) 10 1
test_urllib_parse 255.1016 (6.65) 275.0980 (7.10) 259.3193 (6.73) 8.8238 (102.44) 255.5814 (6.63) 5.3559 (48.77) 1;1 3.8562 (0.15) 5 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I'm routinely seeing 6-7x better performance over urllib, and significantly improved performance when actually using the results (ie accessing result.pathname) due to lowered attribute access overhead.
However, this introduces CMake as a build time dependency, and reduces the available targets (CPython 3.8+, PyPy > 3.8). Have not yet found a way to eliminate CMake as a dependency. I don't really mind if we only target newer versions of Python.
@lemire @wjakob
@TkTech Did you see the instructions I included here? https://github.com/wjakob/nanobind/blob/master/src/nb_combined.cpp. This should allow you to compile with essentially any other kind of build system, though some work will be needed to replicate all the bells and whistles of what nanobind's cmake tooling provides out of the box. Out of curiosity, what's the relative speedup over the previous pybind11-based version?
@wjakob That's fantastic, I'll give it a full read this weekend and give it a try.
Relative speedup is 30-33%.
Any update on this? I'd like to see a switch to nanobind. I was going to implement a cython version but if there is a nanobind version then there is no need since it's pretty much as fast as cython.
Let me know if I can help!