turso Profile-Guided Optimization (PGO) benchmark report

Hi!

Thank you for the project! I evaluated Profile-Guided Optimization (PGO) on many projects - all the results are available at https://github.com/zamazan4ik/awesome-pgo . Since this compiler optimization works well in many places including optimization databases (including SQLite), I decided to apply it to the project - here are my benchmark results.

Test environment

Fedora 40
Linux kernel 6.9.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.79
limbo version: main branch on commit 93a634d334a6a0629f516e31691c88742519ceed
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with taskset -c 0 cargo bench command. The PGO training phase is done with taskset -c 0 cargo pgo bench, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench.

taskset -c 0 is used for reducing the OS scheduler influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/f9fe60e4cb0013c93754d97257fbc668
PGO optimized compared to Release: https://gist.github.com/zamazan4ik/df2c1961d4ab1faf00d4cb51c2c14b62
(just for reference) PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/5065fbc26920888e9efddc3c6f207aeb

According to the results, PGO measurably improves Limbo's performance.

Rusqlite performance wasn't improved since cargo-pgo cannot optimize non-Rust code with PGO. It's possible to achieve with manually passing corresponding compiler switches to the C-part but I didn't do that during this test since I was interested only in optimizing Limbo's speed.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks with other datasets (if you are interested enough in it). If it shows improvements - add a note to the documentation (the README file, I guess) about possible improvements in the library's performance with PGO.
Probably, you can try to get some insights about how the code can be optimized further based on the changes that the compiler performed with PGO. It can be done via analyzing flamegraphs before and after applying PGO to understand the difference or checking some assembly/LLVM IR differences before and after PGO.

I would be happy to answer your questions about PGO.

P.S. It's just a benchmark report with some an idea for improvement for the project. I created the Issue only because Discussions are disabled for the repository.

Jul 06 '24 16:07 zamazan4ik

Hey, thanks for sharing! 10-15% improvement is indeed a lot so worth exploring. The microbenchmarks are currently all we have, but once we get TPC-H benchmarks going (#4), worth checking them out too. Really curious to hear more about what optimizations PGO does. I opened discussions for this repository so we can continue there.

Jul 07 '24 06:07 penberg

Lots of code changed so this report is now stale. Closing

Jun 27 '25 06:06 penberg

@penberg do we need to create another issue for PGO? I still see that PGO is mentioned in https://github.com/tursodatabase/turso/issues/684 but if this issue is closed - other people can think PGO doesn't work for Turso.

I re-ran the benchmark with the latest Turso version and the Clickbench bench suite (with your scripts). I still see measurable performance improvements from applying PGO for Turso, even after lots of code is changed since the last report. And compared to my previous report - this is a real bench suite, not a synthetic one ;)

I attached the benchmark report for the Release Turso (clickbench-tursodb-release.txt), PGO-optimized Turso (clickbench-tursodb-optimized.txt), and a regular SQLite (clickbench-sqlite3.txt) for comparison. Training workload is the same bench suite that I used with an instrumented TursoDB (via cargo-pgo).

clickbench-tursodb-release.txt clickbench-tursodb-optimized.txt clickbench-sqlite3.txt

So PGO is still a viable option for the database.

Really curious to hear more about what optimizations PGO does

Long story short: much better inlining (better doesn't mean more aggressive - it depends on the workload), much better hot/cold code split. Even if some optimizations can be done "manually" via compiler intrinsics, etc - PGO simplifies this machinery a lot.

Jun 28 '25 08:06 zamazan4ik