Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) on ctags
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. I also have several examples of applying PGO to the software like ctags:
- LLVM-based tooling (like Clangd): link
- Clangd benchmarks with PGO from Jetbrains: JetBrains blog
- Rust Analyzer: GitHub comment
That's why I think trying to optimize ctags with PGO can be a good idea.
I can suggest the following action points:
- Perform PGO benchmarks on ctags. And if it shows improvements - add a note to the documentation about possible improvements in ctags performance with PGO.
- Providing an easier way (e.g. an additional build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize ctags according to their workloads.
- Optimize pre-built binaries (if it's possible to prepare a good training workload)
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
Thank you for this input.
I tried using the Linux kernel as input. With PGO, it is about 10% faster.
before [yamato@dev64]~/var/codebase% /bin/time ~/bin/ctags --options=NONE -o - -R ~/var/linux > /dev/null
ctags: Notice: No options will be read from files or environment
70.10user 9.89system 1:14.92elapsed 106%CPU (0avgtext+0avgdata 2622720maxresident)k
0inputs+0outputs (0major+887201minor)pagefaults 0swaps
after [yamato@dev64]~/var/codebase% /bin/time ~/bin/ctags --options=NONE -o - -R ~/var/linux > /dev/null
ctags: Notice: No options will be read from files or environment
64.04user 9.78system 1:08.90elapsed 107%CPU (0avgtext+0avgdata 2623040maxresident)k
0inputs+0outputs (0major+920723minor)pagefaults 0swaps
diff --git a/Makefile.am b/Makefile.am
index 4151ef0bf..ca610354b 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -67,7 +67,7 @@ noinst_LIBRARIES += libutil.a
noinst_PROGRAMS = utiltest
-AM_LDFLAGS = $(EXTRA_LDFLAGS)
+AM_LDFLAGS = $(EXTRA_LDFLAGS) $(COVERAGE_LDFLAGS)
# packcc always uses native compiler even when cross-compiling.
# packcc cannot use the standard Automake rule.
@@ -127,7 +127,7 @@ libutil_a_CPPFLAGS = -I$(srcdir) -I$(srcdir)/main
libutil_a_CFLAGS =
libutil_a_CFLAGS += $(EXTRA_CFLAGS)
libutil_a_CFLAGS += $(WARNING_CFLAGS)
-libutil_a_CFLAGS += $(COVERAGE_CFLAGS)
+libutil_a_CFLAGS += $(COVERAGE_CFLAGS) $(PGO_CFLAGS)
if ENABLE_DEBUGGING
libutil_a_CPPFLAGS+= $(DEBUG_CPPFLAGS)
endif
@@ -161,7 +161,7 @@ libctags_a_CPPFLAGS+= -DHAVE_REPOINFO_H
libctags_a_CFLAGS =
libctags_a_CFLAGS += $(EXTRA_CFLAGS)
libctags_a_CFLAGS += $(WARNING_CFLAGS)
-libctags_a_CFLAGS += $(COVERAGE_CFLAGS)
+libctags_a_CFLAGS += $(COVERAGE_CFLAGS) $(PGO_CFLAGS)
libctags_a_CFLAGS += $(CGCC_CFLAGS)
libctags_a_CFLAGS += $(LIBXML_CFLAGS)
libctags_a_CFLAGS += $(JANSSON_CFLAGS)
@@ -255,6 +255,7 @@ ctags_LDADD += $(LIBYAML_LIBS)
ctags_LDADD += $(SECCOMP_LIBS)
ctags_LDADD += $(ICONV_LIBS)
ctags_LDADD += $(PCRE2_LIBS)
+ctags_LDFLAGS = $(PGO_LDFLAGS)
dist_ctags_SOURCES = $(CMDLINE_HEADS) $(CMDLINE_SRCS)
if HOST_MINGW
diff --git a/configure.ac b/configure.ac
index 1d7909e2a..0822ed1a4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -955,6 +955,29 @@ if test "$ETAGS_NAME_EXECUTABLE" != etags ; then
AC_MSG_NOTICE(Changing name of 'etags' for $ETAGS_NAME_EXECUTABLE)
fi
+# Profile Guided Optimization (PGO)
+# ------------
+PGO_STAGE=none
+PGO_CLFAGS=
+PGO_LDFLAGS=
+
+AC_ARG_WITH([pgo-stage],
+ AS_HELP_STRING([--with-pgo-stage=STAGE], [none, bootstrap, or apply]),
+ [PGO_STAGE="$withval"])
+
+if test "x$PGO_STAGE" = "xbootstrap"; then
+ PGO_CFLAGS='-fprofile-generate=$(abs_top_builddir)'
+ PGO_LDFLAGS='-fprofile-generate=$(abs_top_builddir)'
+fi
+
+if test "x$PGO_STAGE" = "xapply"; then
+ PGO_CFLAGS='-fprofile-use=$(abs_top_builddir)'
+ PGO_LDFLAGS='-fprofile-use=$(abs_top_builddir)'
+fi
+
+AC_SUBST([PGO_CFLAGS])
+AC_SUBST([PGO_LDFLAGS])
+
# Output files
# ------------
$ cd ctags
$ ./autogen.sh; make clean;./configure; make -j 100
$ time ./ctags --options=NONE -o - -R ~/var/linux > /dev/null ; : before
$ make clean
$ ./configure --with-pgo-stage=bootstrap ; make -j 100
$ ./ctags --options=NONE -o - -R ~/var/linux > /dev/null ; : profiling
$ ./configure --with-pgo-stage=apply ; make -j 100
$ time ./ctags --options=NONE -o - -R ~/var/linux > /dev/null ; : after
After polishing, I will make a pull request for this.