gentooLTO icon indicating copy to clipboard operation
gentooLTO copied to clipboard

LTO benchmarks

Open jfikar opened this issue 3 years ago • 9 comments

Hi guys, I was always wondering, how much faster is LTO optimized system compared to a default one. One can use Phoronix Test Suite, but I've created simple self-made bash scripts to run 28 benchmarks. Here are the results:

Test on Intel Xeon E3-1265L V2, 2.50GHz, RUNS=20

Default Gentoo: -march=native -O2 -march=native Default LTO: -march=native -O3 ${GRAPHITE} ${DEVIRTLTO} ${IPAPTA} ${SEMINTERPOS} ${FLTO} -fuse-linker-plugin Only LTO: -march=native -O2 ${FLTO} -fuse-linker-plugin

world was always rebuild, however, the script lists just the packages, which needs to be rebuild for the benchmarks

for relative performance + means faster (better), - means slower (worse)

Bench Default Gentoo [s] Default LTO [s] Default LTO [% rel. to Gentoo] Only LTO Only LTO [% rel. to Gentoo]
bash 12.369±0.017 11.480±0.074 +7.74±0.71 11.419±0.017 +8.31±0.22
ash 8.208±0.015 8.100±0.055 +1.33±0.71 8.061±0.021 +1.82±0.32
dash 4.472±0.010 4.298±0.041 +4.04±1.02 4.280±0.014 +4.48±0.41
bc 16.764±0.011 18.458±0.014 -9.17±0.09 17.723±0.012 -5.41±0.09
java 27.484±0.089 27.678±0.122 -0.70±0.54 27.368±0.069 +0.43±0.41
lammps 22.069±0.217 21.985±0.234 0.38±1.45 22.502±0.232 -1.92±1.40
lzop 2.422±0.008 2.487±0.031 -2.61±1.26 2.420±0.01 +0.08±0.53
lz4 2.321±0.008 4.879±0.035 -52.42±0.38 2.335±0.009 -0.60±0.51
zstd 1.744±0.020 1.777±0.029 -1.86±1.96 1.732±0.014 +0.69±1.41
gzip 25.607±0.013 24.893±0.035 +2.86±0.15 25.261±0.021 +1.36±0.10
pigz 5.472±0.008 5.156±0.025 +6.13±0.54 5.472±0.010 0.00±0.23
zopfli 21.823±0.015 21.756±0.052 +0.31±0.25 21.918±0.022 -0.43±0.12
pigz.zopfli 6.348±0.008 5.885±0.028 +7.86±0.53 5.973±0.010 +6.27±0.22
xz 13.280±0.020 13.244±0.041 +0.27±0.35 13.307±0.015 -0.20±0.19
lrzip 12.597±0.051 12.876±0.060 -2.17±0.60 12.600±0.053 -0.02±0.58
gcc 281.756±0.453 280.858±0.485 +0.31±0.24 276.708±0.379 +1.82±0.22
ccache 23.567±3.722 23.421±3.672 +0.62±22.39 23.177±3.649 +1.68±22.68
clang 704.925±0.490 578.851±0.557 +21.77±0.14 711.065±0.467 -0.86±0.09
eix 2.291±0.613 2.775±0.584 -17.44±28.10 2.562±0.520 -10.57±30.03
emerge 20.137±2.266 18.546±2.189 +8.57±17.71 18.878±1.715 +6.66±15.43
normalize 0.240±0.009 0.346±0.012 -30.63±3.54 0.237±0.007 +1.26±4.83
flac 7.572±0.123 7.074±0.248 +7.03±4.14 8.071±0.110 -6.18±1.99
ogg 7.925±0.131 7.580±0.380 +4.55±5.52 7.934±0.118 -0.11±2.22
lame 14.085±0.130 12.112±0.337 +16.28±3.41 13.896±0.125 +1.36±1.31
mencoder 22.345±0.202 22.532±0.397 -0.82±1.96 22.366±0.218 -0.09±1.33
jpegtran 23.133±0.007 23.620±0.062 -2.06±0.26 23.108±-0.006 +0.10±0.04
optipng 44.846±0.013 39.837±0.035 +12.57±0.10 44.052±0.014 +1.80±0.04
zopflipng 14.085±0.023 13.756±0.047 +2.39±0.39 14.375±0.025 -2.01±0.23
average [%] -1.79 +0.28

It looks like in most cases we gain performance, but in some (bc, lzop, lz4, lrzip, normalize, jpegtran) we loose. It seems most of the losses are caused by the advanced optimizations, not by LTO itself (except for bc and flac).

Also, averaging over all packages shows a lost of -1.79% for default gentooLTO and quite small gain of 0.28% for LTO only.

Should we optimize the packages separately? What do you think?

jfikar avatar Sep 04 '20 15:09 jfikar

This looks really cool, thanks for taking the time to do world rebuilds. I'll give it a shot soon. A suggestion I have is to mention the versions used, if certain versions are blocked a prefix could be used to build in.

jiblime avatar Sep 05 '20 03:09 jiblime

Wow, thanks for doing this! I wouldn't be surprised at all at this point if Graphite were to blame for some of the negative discrepancies shown. One interesting result is lz4 taking around twice the amount of time.

InBetweenNames avatar Sep 26 '20 15:09 InBetweenNames

Tried lz4 as one "suffering" the most. On my Ryzen I did not see any difference. -O2 vs -O3 vs full-fledged LTO package showed the same performance within maybe 1% gap. Ebuild is dead simple and obviously does'n fiddle with CFLAGS. So it may be some Intel+GCC issue...

kanyck avatar Jan 21 '21 20:01 kanyck

More results from another benchmark for your viewing pleasure: https://openbenchmarking.org/result/1307063-UT-GCCOPTIMI03

WillPower3309 avatar Feb 16 '21 04:02 WillPower3309

Looks like graphite does'n bring in any benefits, to put it mildly...

kanyck avatar Feb 18 '21 12:02 kanyck

Looks like ten y.o. benchmark, first of all.

pchome avatar Feb 18 '21 16:02 pchome

IMHO graphite should be disabled by default, I had some ugly and unpredictable bugs due to it and not only it doesn't seem to bring any performance benefits but can slow down significantly some programs, like zstd.

barolo avatar Feb 18 '21 17:02 barolo

IMHO graphite should be disabled by default, I had some ugly and unpredictable bugs due to it and not only it doesn't seem to bring any performance benefits but can slow down significantly some programs, like zstd.

I believe clear linux enable/disable graphite on a per-bundle basis. They've never said that graphite is enabled system wide.

addeps3 avatar Feb 18 '21 18:02 addeps3

Looks like graphite does'n bring in any benefits, to put it mildly...

For a lot of those tests in the link I sent, fewer is better in the scales, however there is definitely some conflicting results there

WillPower3309 avatar Feb 18 '21 18:02 WillPower3309