criterion.nim
criterion.nim copied to clipboard
use volatileLoad to prevent optimizing away results
in my own benchmarking lib I'm introducing preventOptimizeOut
based on volatileLoad
to:
- prevent optimizing out results
- have minimum impact on runtime performace
- be as simple to use as possible for users of benchmark
- works with all input types
benchmark.nim:
from std/volatile import volatileLoad
template benchmark(...) ...
proc preventOptimizeOut*[T](a:T)=
## make sure `a` computation doesn't get optimized away
var b = false
if volatileLoad(addr b):
# `if b` would not work, need volatile
echo repr(a)
test.nim:
echo: benchmark("test_benchmark", 1000):
a=0.0
for i in 0..<100000:
a+=sin(i.float)
preventOptimizeOut(a)
The problem here is that you want a barrier that has little or no influence on the benchmark result. For some inspiration you can check out what Google's benchmark library does (or Facebook's folly implementation that's similar).
This said I'd be happy to introduce a proc doNotOptimize[T](val: var T)
and will have a look asap (if nobody beats me to the punch :)
The problem here is that you want a barrier that has little or no influence on the benchmark result
yes; isn't the implementation I suggested above for preventOptimizeOut
achieving that goal? if not, what would be the problems with it?
I've pushed a simpler and hopefully lighter barrier called blackBox
, try it out and tell me if it works for you.
hmm I wonder whether your version guarantees compiler won't optimize away the expression; a test would at least help ;
void inner(char const volatile *x){}
since x isn't accessed, can't the compiler optimize it away even though x is volatile? eg https://stackoverflow.com/a/51488739/1426932 according to the "as if" rule mentioned above, optimizing it away doesn't change behavior in your code.
Note that in my snippet, compiler can't optimize away since behavior (echo repr(a)
) depends on volatile b ; Note that it won't actually echo anything since b will be false.
One thing that may not be good in my snippet is that the code for repr(a)
has to be generated (even if not being executed) and could potentially affect instruction cache (and it could be lots of code).
this https://yangzhang.tech/blog/2015/04/27/prevent-gcc-optimize-away-code/ points to the folly example.
In anyway, we absolutely should guarantee it doesn't optimize away code otherwise we'll get false conclusions from running benchmarks
If you check out the produced C code the barrier boils down to a call to inner
that's defined outside of the module containing the benchmark.
If you dont consider cross-module optimizations such as LTO
I believe the optimizer cannot do much here.