bikeshed icon indicating copy to clipboard operation
bikeshed copied to clipboard

implement preds.py as a Rust wheel with PyO3 & Maturin

Open keithamus opened this issue 3 months ago • 1 comments

This change is a proof of concept for how possible it would be to take individual python modules and re-implement them in Rust, in a "Ship of Theseus" style Rust rewrite.

This takes some of the simplest parts of the codebase - a list of predicate free functions - and turns them into Rust checks. These functions become quite a bit more trivial in Rust as Rust implements many of these functions on char, so we simply delegate to those methods.

These functions where chosen not because they're a slow path, but more because they are straightforward functions that return booleans and show demonstrate a good proof of concept. They certainly don't make things slower, however.

The next steps would be to take some of the bigger functions in preds.py that do full string comparison (such as isXMLishTagname), and port those. This was avoided for now as these might be slightly more controversial, as we might want to include dependencies for fast string matching.

I'm not very well versed in Python, this was mostly cobbled together using https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424 and https://colliery.io/blog/rust-python-pattern/ as guides.

Some completely non empirical evaluation of timings, I ran the tests with/without Rust bindings:

$ time BIKESHED_USE_RUST=0 ./bikeshed.py test --no-update
Running tests |████████████████████| 502/502 [100%] in 3:04.5 (2.72/s)
✔ All tests passed.
________________________________________________________
Executed in  184.82 secs    fish           external
   usr time  176.82 secs  717.00 micros  176.82 secs
   sys time    1.99 secs    0.00 micros    1.99 secs

$ time BIKESHED_USE_RUST=1 ./bikeshed.py test --no-update
Running tests [R] |████████████████████| 502/502 [100%] in 3:04.4 (2.72/s)
✔ All tests passed.

________________________________________________________
Executed in  184.72 secs    fish           external
   usr time  177.16 secs  481.00 micros  177.16 secs
   sys time    1.96 secs  172.00 micros    1.96 secs

As expected both take essentially the same time. This may prove 1 of 2 things:

  • I'm an idiot and haven't done this right.
  • The FFI boundary between Python/Rust isn't costing us much (at least for simple checks like these).

keithamus avatar Oct 02 '25 11:10 keithamus

Oooh, hell yeah. I've been meaning to do this exact thing for a while.

Note that I've got a huge change to the custom parser living in a branch, tho the edits to this file in particular are minimal. Don't go converting parser.py, tho. 😁

Another good test case will be the serializer, I think. Self-contained, lots of string building that I think is slow in Python, and a noticable chunk of time in large documents.

I won't be able to dig into this more immediately, as my vacation starts today, but I'm very excited to find in when I get back.

tabatkins avatar Oct 02 '25 13:10 tabatkins