Profile-Guided Optimization (PGO) evaluation
Hi!
I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Hurl. I did some benchmarks on my local machine and want to share my results.
Test environment
- Apple Macbook M1 (full charge, AC-connected)
- macOS 13.4 Ventura
- Rust: 1.72
- Latest
hurlfrom themasterbranch (commit7ed25baac9934a1a86f61a8f4bedcdc76dbaa2a2)
Test workload
As a test scenario, I used benches from the repo. The only differences are increased request count (10k) and using an Axum-based HTTP server instead of Flask (because on my machine this Flask-based server is overloaded with these benchmarks and stuck in a few moments).
All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee this on macOS). All PGO optimizations are done with cargo-pgo. The profile information was collected from the benchmarks as well.
Results
Here are the results in Hyperfine format, where the PGO-optimized binary is compared to the Release binary (PGO-optimized is faster according to the results below):
Benchmark 1: /Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl
Time (mean ± σ): 3.260 s ± 0.111 s [User: 1.471 s, System: 0.666 s]
Range (min … max): 3.106 s … 3.459 s 20 runs
Benchmark 2: /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl
Time (mean ± σ): 3.726 s ± 0.282 s [User: 1.783 s, System: 0.707 s]
Range (min … max): 3.410 s … 4.505 s 20 runs
Summary
/Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl ran
1.14 ± 0.10 times faster than /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl
Some conclusions
- PGO shows great improvements in
hurlperformance at least in the provided by the project's benchmarks. I think the same results can be estimated for other cases.
Further steps
I can suggest to do the following things:
- Add a note to the Hurl documentation about building with PGO. In this case, users and maintainers who build their own Hurl binaries will be aware of PGO as an additional way to optimize the project.
- Optimize provided by Hurl project binaries on the CI (like it's already done for other projects like Rustc), if any.
- Try to evaluate LLVM BOLT in addition to PGO on Hurl.
Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"
Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"
Sure!
Firstly, I want to share with you existing PGO-oriented documentation in other projects:
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback"
- Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html
- Rustc: https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
- tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md
I hope you can find something useful in the examples above.
About suggesting a small text about PGO, I suggest you answer the following questions in this text:
- What is PGO? A link to the Rustc documentation should be enough, IMHO
- What benefits does PGO bring to Hurl? Here we can reference this issue with actual benchmarks
- How to build Hurl with PGO? Here we can write a simple instruction for building Hurl with PGO via cargo-pgo or with raw compiler PGO-related options (Rustc documentation)
Where to put this instruction? I guess somewhere in "Building from sources" documentation.
So I think the text could look like this (as a reference I used Vector documentation about PGO):
"Profile-Guided Optimization (PGO) is a compiler optimization technique where a program is optimized based on the runtime profile.
According to the tests, we see improvements of up to 20% faster request executions in the benchmark. The performance benefits depend on your typical workload - you can get better or worse results.
More information about PGO in Hurl can be found in the corresponding GitHub issue.
How to build Hurl with PGO?
There are two major kinds of PGO: Instrumentation and Sampling (also known as AutoFDO). In this guide, is described the Instrumentation PGO with Hurl. We use cargo-pgo for building Hurl with PGO.
- Install cargo-pgo.
- Check out the Hurl repository.
- Go to the Hurl source directory and run
cargo pgo build. It will build the instrumented Hurl version. - Run instrumented Hurl on your test load. Usually, performing several workload-representative requests is enough to collect a good PGO profile (but your case can be different).
- Run
cargo pgo optimize. It will build Hurl with PGO optimization.
A more detailed guide on how to apply PGO is in the Rust documentation."
I think having something like this in the documentation is fine.
Thanks a lot, I'll put all this in the repo under "contrib", and link it in the documentation!