Evaluate Profile-Guided Optimization
Hi!
Recently I tested a lot of software with PGO and measured the performance improvements from PGO - the results are here. Since my results show interesting improvements on a lot of databases (including MySQL, MariaDB, and PostgreSQL) I think it would be a good idea to measure PGO effects on RonDB as well. If the results will show an improvement - would be great to see a note in the documentation about PGO.
Interesting, I will definitely study your material. Our story on PG0 is the following. We did some experiments on MySQL a few years ago and results were a bit differing but mostly around +20% improvement. When we forked MySQL into RonDB we decided to use PG0 for best throughput and latency. The measurements I saw was that mysqld (the MySQL Server) when used with NDB storage engine gained about 20%, the ndbmtd (the RonDB data nodes) gained a lot less, in the order of 3-5%. Since the beginning we have built our production binaries using PG0 and so far we haven't seen any quality issues arise from this usage.
The above numbers we got with GCC 8. However some benchmarks I did more recently indicated that newer GCC versions actually decreased performance for mysqld. My suspicion is that these newer GCCs might be too aggressive with inlining. However I haven't had time any deeper into this problem, there is still a fair gain in using PG0 compared to not. Our release scripts is found in the tree under build_scripts/release_scripts. We build in a Docker container using CentOS 7 to ensure we have compatability with the Glibc versions in the OS we use in production. There is a flag to build with or without PG0. Our release builds always use the PG0 flag.
We currently build production binaries using GCC 9. The new RonDB 22.10 is built with a newer GCC version. We are likely to update the GCC version used for 22.10 moving forward, but most likely not for RonDB 21.04.
Thanks a lot for sharing your PGO numbers for RonDB!
@mronstro Since PGO definitely improves RonDB performance, I think it's worth documenting PGO in the RonDB documentation. In this case, users and/or RonDB maintainers (if a company maintains their own RonDB build) will be aware of PGO effects on RonDB and will be able to use PGO to optimize RonDB for their own workloads. We can put it somewhere in the performance-oriented section of the docs.
Here are the examples of similar documentation in other projects:
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html