hurl icon indicating copy to clipboard operation
hurl copied to clipboard

Profile-Guided Optimization (PGO) evaluation

Open zamazan4ik opened this issue 2 years ago • 3 comments

Hi!

I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for Hurl. I did some benchmarks on my local machine and want to share my results.

Test environment

  • Apple Macbook M1 (full charge, AC-connected)
  • macOS 13.4 Ventura
  • Rust: 1.72
  • Latest hurl from the master branch (commit 7ed25baac9934a1a86f61a8f4bedcdc76dbaa2a2 )

Test workload

As a test scenario, I used benches from the repo. The only differences are increased request count (10k) and using an Axum-based HTTP server instead of Flask (because on my machine this Flask-based server is overloaded with these benchmarks and stuck in a few moments).

All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee this on macOS). All PGO optimizations are done with cargo-pgo. The profile information was collected from the benchmarks as well.

Results

Here are the results in Hyperfine format, where the PGO-optimized binary is compared to the Release binary (PGO-optimized is faster according to the results below):

Benchmark 1: /Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl
  Time (mean ± σ):      3.260 s ±  0.111 s    [User: 1.471 s, System: 0.666 s]
  Range (min … max):    3.106 s …  3.459 s    20 runs

Benchmark 2: /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl
  Time (mean ± σ):      3.726 s ±  0.282 s    [User: 1.783 s, System: 0.707 s]
  Range (min … max):    3.410 s …  4.505 s    20 runs

Summary
  /Users/zamazan4ik/open_source/hurl/target/aarch64-apple-darwin/release/hurl tests/hello_10000.hurl ran
    1.14 ± 0.10 times faster than /Users/zamazan4ik/open_source/hurl/target/release/hurl tests/hello_10000.hurl

Some conclusions

  • PGO shows great improvements in hurl performance at least in the provided by the project's benchmarks. I think the same results can be estimated for other cases.

Further steps

I can suggest to do the following things:

  • Add a note to the Hurl documentation about building with PGO. In this case, users and maintainers who build their own Hurl binaries will be aware of PGO as an additional way to optimize the project.
  • Optimize provided by Hurl project binaries on the CI (like it's already done for other projects like Rustc), if any.
  • Try to evaluate LLVM BOLT in addition to PGO on Hurl.

zamazan4ik avatar Sep 12 '23 01:09 zamazan4ik

Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"

jcamiel avatar Jan 10 '24 13:01 jcamiel

Hi @zamazan4ik could you propose us a small text for the documentation about "Add a note to the Hurl documentation about building with PGO"

Sure!

Firstly, I want to share with you existing PGO-oriented documentation in other projects:

  • ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
  • Databend: https://databend.rs/doc/contributing/pgo
  • Vector: https://vector.dev/docs/administration/tuning/pgo/
  • Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
  • GCC: Official docs, section "Building with profile feedback"
  • Clang:
    • https://llvm.org/docs/HowToBuildWithPGO.html
    • https://llvm.org/docs/AdvancedBuilds.html
  • Rustc: https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
  • tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md

I hope you can find something useful in the examples above.

About suggesting a small text about PGO, I suggest you answer the following questions in this text:

  • What is PGO? A link to the Rustc documentation should be enough, IMHO
  • What benefits does PGO bring to Hurl? Here we can reference this issue with actual benchmarks
  • How to build Hurl with PGO? Here we can write a simple instruction for building Hurl with PGO via cargo-pgo or with raw compiler PGO-related options (Rustc documentation)

Where to put this instruction? I guess somewhere in "Building from sources" documentation.

So I think the text could look like this (as a reference I used Vector documentation about PGO):

"Profile-Guided Optimization (PGO) is a compiler optimization technique where a program is optimized based on the runtime profile.

According to the tests, we see improvements of up to 20% faster request executions in the benchmark. The performance benefits depend on your typical workload - you can get better or worse results.

More information about PGO in Hurl can be found in the corresponding GitHub issue.

How to build Hurl with PGO?

There are two major kinds of PGO: Instrumentation and Sampling (also known as AutoFDO). In this guide, is described the Instrumentation PGO with Hurl. We use cargo-pgo for building Hurl with PGO.

  • Install cargo-pgo.
  • Check out the Hurl repository.
  • Go to the Hurl source directory and run cargo pgo build. It will build the instrumented Hurl version.
  • Run instrumented Hurl on your test load. Usually, performing several workload-representative requests is enough to collect a good PGO profile (but your case can be different).
  • Run cargo pgo optimize. It will build Hurl with PGO optimization.

A more detailed guide on how to apply PGO is in the Rust documentation."

I think having something like this in the documentation is fine.

zamazan4ik avatar Jan 11 '24 02:01 zamazan4ik

Thanks a lot, I'll put all this in the repo under "contrib", and link it in the documentation!

jcamiel avatar Jan 11 '24 05:01 jcamiel