can_ada [Experiment] Switching from pybind11 to nanobind for function call overhead improvements

Switching from pybind11 to nanobind offers some performance improvements with minimal code changes. Our new benchmarks are:

------------------------------------------------------------------------------------- benchmark: 3 tests ------------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_can_ada_parse         38.3641 (1.0)       38.7200 (1.0)       38.5535 (1.0)      0.0861 (1.0)       38.5595 (1.0)      0.1098 (1.0)           9;0  25.9380 (1.0)          26           1
test_ada_python_parse     111.0045 (2.89)     111.3101 (2.87)     111.1474 (2.88)     0.1099 (1.28)     111.1436 (2.88)     0.1624 (1.48)          4;0   8.9971 (0.35)         10           1
test_urllib_parse         255.1016 (6.65)     275.0980 (7.10)     259.3193 (6.73)     8.8238 (102.44)   255.5814 (6.63)     5.3559 (48.77)         1;1   3.8562 (0.15)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I'm routinely seeing 6-7x better performance over urllib, and significantly improved performance when actually using the results (ie accessing result.pathname) due to lowered attribute access overhead.

However, this introduces CMake as a build time dependency, and reduces the available targets (CPython 3.8+, PyPy > 3.8). Have not yet found a way to eliminate CMake as a dependency. I don't really mind if we only target newer versions of Python.

@lemire @wjakob

Mar 17 '24 21:03 TkTech

@TkTech Did you see the instructions I included here? https://github.com/wjakob/nanobind/blob/master/src/nb_combined.cpp. This should allow you to compile with essentially any other kind of build system, though some work will be needed to replicate all the bells and whistles of what nanobind's cmake tooling provides out of the box. Out of curiosity, what's the relative speedup over the previous pybind11-based version?

Mar 19 '24 08:03 wjakob

@wjakob That's fantastic, I'll give it a full read this weekend and give it a try.

Relative speedup is 30-33%.

Mar 19 '24 16:03 TkTech

Any update on this? I'd like to see a switch to nanobind. I was going to implement a cython version but if there is a nanobind version then there is no need since it's pretty much as fast as cython.

Let me know if I can help!

Mar 07 '25 05:03 raceychan