graph-prototype
graph-prototype copied to clipboard
[3pt] graph-prototype: Optimize compilation time
GP is a TMP-heavy library. See if compilation times can be optimized without sacrificing the runtime performance.
Primary focus - document:
- identify bottlenecks -- what are the heaviest template constructs
- iterate on SFINAE vs. concepts
- evaluate PCH
- ...
This task can be tackled using Clang's -ftime-trace
compile-flag and in-built compile-time instrumentation as described in bit more detail here and partially implemented and tested in PR#299.
How to compile and enable compile-time profiling:
cmake -GNinja -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS="-fuse-ld=lld -ftime-trace" ..
which produces a set of .json' files containing the compile-time trace information for each compilation unit that can be inspected either with chrome's inbuilt interface (use:
chrome://tracing` in the URL bar) or via the https://ui.perfetto.dev/ site (same functionality, perhaps a bit nicer UI). Which yields flame graphs such as
which can be drilled-down similarly to any runtime performance analysis.
Post-Processing using https://github.com/aras-p/ClangBuildAnalyzer/ (i.e. creating histograms of common template patterns):
Either on a single .json
file or a whole build sub-directory:
ClangBuildAnalyzer --all <artifacts_folder> <capture_file>
Processing all files and saving to '<capture_file>'...
done in 4.2s. Run 'ClangBuildAnalyzer --analyze <capture_file> > <text output>' to analyze it.
You may want to set and configure a default ClangBuildAnalyzer.ini and notably set the default maxNameLength = 70
to see the full template and compile-unit names.
As a general guidance:
- first try to reduce the file- and template-specific overheads, before
- tackling global compile-time optimisations such as PCH, unity-builts etc.
since the latter may hide structural problems that re-emerge only very (too?) late in further developments.