gentooLTO
gentooLTO copied to clipboard
LTO benchmarks
Hi guys, I was always wondering, how much faster is LTO optimized system compared to a default one. One can use Phoronix Test Suite, but I've created simple self-made bash scripts to run 28 benchmarks. Here are the results:
Test on Intel Xeon E3-1265L V2, 2.50GHz, RUNS=20
Default Gentoo: -march=native -O2 -march=native Default LTO: -march=native -O3 ${GRAPHITE} ${DEVIRTLTO} ${IPAPTA} ${SEMINTERPOS} ${FLTO} -fuse-linker-plugin Only LTO: -march=native -O2 ${FLTO} -fuse-linker-plugin
world was always rebuild, however, the script lists just the packages, which needs to be rebuild for the benchmarks
for relative performance + means faster (better), - means slower (worse)
Bench | Default Gentoo [s] | Default LTO [s] | Default LTO [% rel. to Gentoo] | Only LTO | Only LTO [% rel. to Gentoo] |
---|---|---|---|---|---|
bash | 12.369±0.017 | 11.480±0.074 | +7.74±0.71 | 11.419±0.017 | +8.31±0.22 |
ash | 8.208±0.015 | 8.100±0.055 | +1.33±0.71 | 8.061±0.021 | +1.82±0.32 |
dash | 4.472±0.010 | 4.298±0.041 | +4.04±1.02 | 4.280±0.014 | +4.48±0.41 |
bc | 16.764±0.011 | 18.458±0.014 | -9.17±0.09 | 17.723±0.012 | -5.41±0.09 |
java | 27.484±0.089 | 27.678±0.122 | -0.70±0.54 | 27.368±0.069 | +0.43±0.41 |
lammps | 22.069±0.217 | 21.985±0.234 | 0.38±1.45 | 22.502±0.232 | -1.92±1.40 |
lzop | 2.422±0.008 | 2.487±0.031 | -2.61±1.26 | 2.420±0.01 | +0.08±0.53 |
lz4 | 2.321±0.008 | 4.879±0.035 | -52.42±0.38 | 2.335±0.009 | -0.60±0.51 |
zstd | 1.744±0.020 | 1.777±0.029 | -1.86±1.96 | 1.732±0.014 | +0.69±1.41 |
gzip | 25.607±0.013 | 24.893±0.035 | +2.86±0.15 | 25.261±0.021 | +1.36±0.10 |
pigz | 5.472±0.008 | 5.156±0.025 | +6.13±0.54 | 5.472±0.010 | 0.00±0.23 |
zopfli | 21.823±0.015 | 21.756±0.052 | +0.31±0.25 | 21.918±0.022 | -0.43±0.12 |
pigz.zopfli | 6.348±0.008 | 5.885±0.028 | +7.86±0.53 | 5.973±0.010 | +6.27±0.22 |
xz | 13.280±0.020 | 13.244±0.041 | +0.27±0.35 | 13.307±0.015 | -0.20±0.19 |
lrzip | 12.597±0.051 | 12.876±0.060 | -2.17±0.60 | 12.600±0.053 | -0.02±0.58 |
gcc | 281.756±0.453 | 280.858±0.485 | +0.31±0.24 | 276.708±0.379 | +1.82±0.22 |
ccache | 23.567±3.722 | 23.421±3.672 | +0.62±22.39 | 23.177±3.649 | +1.68±22.68 |
clang | 704.925±0.490 | 578.851±0.557 | +21.77±0.14 | 711.065±0.467 | -0.86±0.09 |
eix | 2.291±0.613 | 2.775±0.584 | -17.44±28.10 | 2.562±0.520 | -10.57±30.03 |
emerge | 20.137±2.266 | 18.546±2.189 | +8.57±17.71 | 18.878±1.715 | +6.66±15.43 |
normalize | 0.240±0.009 | 0.346±0.012 | -30.63±3.54 | 0.237±0.007 | +1.26±4.83 |
flac | 7.572±0.123 | 7.074±0.248 | +7.03±4.14 | 8.071±0.110 | -6.18±1.99 |
ogg | 7.925±0.131 | 7.580±0.380 | +4.55±5.52 | 7.934±0.118 | -0.11±2.22 |
lame | 14.085±0.130 | 12.112±0.337 | +16.28±3.41 | 13.896±0.125 | +1.36±1.31 |
mencoder | 22.345±0.202 | 22.532±0.397 | -0.82±1.96 | 22.366±0.218 | -0.09±1.33 |
jpegtran | 23.133±0.007 | 23.620±0.062 | -2.06±0.26 | 23.108±-0.006 | +0.10±0.04 |
optipng | 44.846±0.013 | 39.837±0.035 | +12.57±0.10 | 44.052±0.014 | +1.80±0.04 |
zopflipng | 14.085±0.023 | 13.756±0.047 | +2.39±0.39 | 14.375±0.025 | -2.01±0.23 |
average [%] | -1.79 | +0.28 |
It looks like in most cases we gain performance, but in some (bc, lzop, lz4, lrzip, normalize, jpegtran) we loose. It seems most of the losses are caused by the advanced optimizations, not by LTO itself (except for bc and flac).
Also, averaging over all packages shows a lost of -1.79% for default gentooLTO and quite small gain of 0.28% for LTO only.
Should we optimize the packages separately? What do you think?
This looks really cool, thanks for taking the time to do world rebuilds. I'll give it a shot soon. A suggestion I have is to mention the versions used, if certain versions are blocked a prefix could be used to build in.
Wow, thanks for doing this! I wouldn't be surprised at all at this point if Graphite were to blame for some of the negative discrepancies shown. One interesting result is lz4
taking around twice the amount of time.
Tried lz4
as one "suffering" the most. On my Ryzen I did not see any difference. -O2 vs -O3 vs full-fledged LTO package showed the same performance within maybe 1% gap. Ebuild is dead simple and obviously does'n fiddle with CFLAGS. So it may be some Intel+GCC issue...
More results from another benchmark for your viewing pleasure: https://openbenchmarking.org/result/1307063-UT-GCCOPTIMI03
Looks like graphite does'n bring in any benefits, to put it mildly...
Looks like ten y.o. benchmark, first of all.
IMHO graphite should be disabled by default, I had some ugly and unpredictable bugs due to it and not only it doesn't seem to bring any performance benefits but can slow down significantly some programs, like zstd.
IMHO graphite should be disabled by default, I had some ugly and unpredictable bugs due to it and not only it doesn't seem to bring any performance benefits but can slow down significantly some programs, like zstd.
I believe clear linux enable/disable graphite on a per-bundle basis. They've never said that graphite is enabled system wide.
Looks like graphite does'n bring in any benefits, to put it mildly...
For a lot of those tests in the link I sent, fewer is better in the scales, however there is definitely some conflicting results there