benchmarks
benchmarks copied to clipboard
Revisit compiler options for tests
Some tests use compilation flags which are not used in the production (like -d:danger
or -mbranches-within-32B-boundaries
or -fbounds-check=off
) thus not giving the realistic results. README should be up-to-dated with the proper instructions to not use non-production flags, and the flags themselves should be up-to-dated too.
The branches within 32B boundaries is the fix for an intel cpu defect that causes inconsistent performance dur to simple changes. https://github.com/kostya/benchmarks/pull/266
@beached That's a tricky one. Officially, it's just a performance regression on Intel processors, not a "defect", and could be eventually fixed by the microcode update. However, I have the latest update for my CPU (released a month ago by Intel), and that flag still shows the performance changes. Moreover, various vendors apply that flag (or rather the required workaround) in their code too, for example - OpenJDK: https://github.com/openjdk/jdk/commit/ccdde497287175933982a5907d1bb8611c0e15a8
I hoped that GCC/LLVM will include that flag into the release meta-flag (-O3
), but looks like it's not happening, and I'm not sure why. I'd inclined to remove that flag and assume that eventually it'll become default (or Intel provide proper update), but as the same time it will make the tests a little bit unfair as other vendors like Oracle already applied that workaround. @kostya Do you have a second opinion regarding that?
my understanding was the flag is because the microcode fix causes perf issues when conditionals are not aligned. the benchmarking issue is the difference can be large when something changes out of the benchmarked code because a new statement throws the expressed machine code out of a alignment and the cpu flushes caches.
me too, i thought it would belong in the arch flags as it affects only a class of cpus
i think we not removing 32B boundaries as an exception
Thanks @ricvelozo for linking this issue.
How about providing both options in the tests (i.e. one with bounds checked and one without) as proposed in https://github.com/kostya/benchmarks/pull/378#issuecomment-942150078 ?
Bounds checking implementations also differ among languages, so this is definitely something which we want to have included in the benchmark results.
Btw. I woudn't do any exceptions for microcode "mistakes" and would not allow things like -mbranches-within-32B-boundaries
.
Btw. I woudn't do any exceptions for microcode "mistakes" and would not allow things like
-mbranches-within-32B-boundaries
.
The issue this fixes is that a large majority of intel CPU's cannot do reliable benchmarking and this makes it consistent or did. Prior it would penalize libraries if they happen to generate branch code that sat off a 32byte boundary. So adding/removing code from a library would have random performance impacts.
@beached yes, I understand the consequences.
My point is, that this is rather a precondition for running this "kostya benchmarks" suit (i.e. running it on a CPU which is known to not exhibit this random performance spikes/downs) rather than a "feature" of the individual tests in the benchmark.
In Swift we use the -Ounchecked
flag, which disables integer overflow checks and preconditions (release mode asserts). It means some methods can do UB.
In standard release mode -O
, just the debug asserts are disabled, and the preconditions abort the program silently on failure. I don't know what optimization level is used in real world apps in Apple Store.
In C++ we use -O3
flag, which in some cases produces wrong assembly or worse performance. Asserts are disabled in release builds, but static_assert
aren't (but are compile-time).
In Rust, the standard release mode disables integer overflow checks, but it can be configured separately. The normal asserts cannot be disabled.