implement preds.py as a Rust wheel with PyO3 & Maturin
This change is a proof of concept for how possible it would be to take individual python modules and re-implement them in Rust, in a "Ship of Theseus" style Rust rewrite.
This takes some of the simplest parts of the codebase - a list of
predicate free functions - and turns them into Rust checks. These
functions become quite a bit more trivial in Rust as Rust implements
many of these functions on char, so we simply delegate to those
methods.
These functions where chosen not because they're a slow path, but more because they are straightforward functions that return booleans and show demonstrate a good proof of concept. They certainly don't make things slower, however.
The next steps would be to take some of the bigger functions in preds.py that do full string comparison (such as isXMLishTagname), and port those. This was avoided for now as these might be slightly more controversial, as we might want to include dependencies for fast string matching.
I'm not very well versed in Python, this was mostly cobbled together using https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424 and https://colliery.io/blog/rust-python-pattern/ as guides.
Some completely non empirical evaluation of timings, I ran the tests with/without Rust bindings:
$ time BIKESHED_USE_RUST=0 ./bikeshed.py test --no-update
Running tests |████████████████████| 502/502 [100%] in 3:04.5 (2.72/s)
✔ All tests passed.
________________________________________________________
Executed in 184.82 secs fish external
usr time 176.82 secs 717.00 micros 176.82 secs
sys time 1.99 secs 0.00 micros 1.99 secs
$ time BIKESHED_USE_RUST=1 ./bikeshed.py test --no-update
Running tests [R] |████████████████████| 502/502 [100%] in 3:04.4 (2.72/s)
✔ All tests passed.
________________________________________________________
Executed in 184.72 secs fish external
usr time 177.16 secs 481.00 micros 177.16 secs
sys time 1.96 secs 172.00 micros 1.96 secs
As expected both take essentially the same time. This may prove 1 of 2 things:
- I'm an idiot and haven't done this right.
- The FFI boundary between Python/Rust isn't costing us much (at least for simple checks like these).
Oooh, hell yeah. I've been meaning to do this exact thing for a while.
Note that I've got a huge change to the custom parser living in a branch, tho the edits to this file in particular are minimal. Don't go converting parser.py, tho. 😁
Another good test case will be the serializer, I think. Self-contained, lots of string building that I think is slow in Python, and a noticable chunk of time in large documents.
I won't be able to dig into this more immediately, as my vacation starts today, but I'm very excited to find in when I get back.