benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

Revisit compiler options for tests

Open nuald opened this issue 4 years ago • 8 comments

Some tests use compilation flags which are not used in the production (like -d:danger or -mbranches-within-32B-boundaries or -fbounds-check=off) thus not giving the realistic results. README should be up-to-dated with the proper instructions to not use non-production flags, and the flags themselves should be up-to-dated too.

nuald avatar Dec 14 '20 18:12 nuald

The branches within 32B boundaries is the fix for an intel cpu defect that causes inconsistent performance dur to simple changes. https://github.com/kostya/benchmarks/pull/266

beached avatar Dec 14 '20 19:12 beached

@beached That's a tricky one. Officially, it's just a performance regression on Intel processors, not a "defect", and could be eventually fixed by the microcode update. However, I have the latest update for my CPU (released a month ago by Intel), and that flag still shows the performance changes. Moreover, various vendors apply that flag (or rather the required workaround) in their code too, for example - OpenJDK: https://github.com/openjdk/jdk/commit/ccdde497287175933982a5907d1bb8611c0e15a8

I hoped that GCC/LLVM will include that flag into the release meta-flag (-O3), but looks like it's not happening, and I'm not sure why. I'd inclined to remove that flag and assume that eventually it'll become default (or Intel provide proper update), but as the same time it will make the tests a little bit unfair as other vendors like Oracle already applied that workaround. @kostya Do you have a second opinion regarding that?

nuald avatar Dec 14 '20 23:12 nuald

my understanding was the flag is because the microcode fix causes perf issues when conditionals are not aligned. the benchmarking issue is the difference can be large when something changes out of the benchmarked code because a new statement throws the expressed machine code out of a alignment and the cpu flushes caches.

me too, i thought it would belong in the arch flags as it affects only a class of cpus

beached avatar Dec 14 '20 23:12 beached

i think we not removing 32B boundaries as an exception

kostya avatar Dec 15 '20 00:12 kostya

Thanks @ricvelozo for linking this issue.

How about providing both options in the tests (i.e. one with bounds checked and one without) as proposed in https://github.com/kostya/benchmarks/pull/378#issuecomment-942150078 ?

Bounds checking implementations also differ among languages, so this is definitely something which we want to have included in the benchmark results.

Btw. I woudn't do any exceptions for microcode "mistakes" and would not allow things like -mbranches-within-32B-boundaries.

dumblob avatar Oct 13 '21 13:10 dumblob

Btw. I woudn't do any exceptions for microcode "mistakes" and would not allow things like -mbranches-within-32B-boundaries.

The issue this fixes is that a large majority of intel CPU's cannot do reliable benchmarking and this makes it consistent or did. Prior it would penalize libraries if they happen to generate branch code that sat off a 32byte boundary. So adding/removing code from a library would have random performance impacts.

beached avatar Oct 13 '21 13:10 beached

@beached yes, I understand the consequences.

My point is, that this is rather a precondition for running this "kostya benchmarks" suit (i.e. running it on a CPU which is known to not exhibit this random performance spikes/downs) rather than a "feature" of the individual tests in the benchmark.

dumblob avatar Oct 13 '21 19:10 dumblob

In Swift we use the -Ounchecked flag, which disables integer overflow checks and preconditions (release mode asserts). It means some methods can do UB.

In standard release mode -O, just the debug asserts are disabled, and the preconditions abort the program silently on failure. I don't know what optimization level is used in real world apps in Apple Store.

In C++ we use -O3 flag, which in some cases produces wrong assembly or worse performance. Asserts are disabled in release builds, but static_assert aren't (but are compile-time).

In Rust, the standard release mode disables integer overflow checks, but it can be configured separately. The normal asserts cannot be disabled.

ricvelozo avatar Mar 10 '23 17:03 ricvelozo