Add a DoNotOptimize that works in device code
It would be nice to have a utility similar to Google Benchmarks "DoNotOptimize" that attempts to prevent compiler optimizations from discarding results.
GBench's version uses inline assembly tricks that won't work with PTX because even if the generated PTX avoids the optimizer, OCG and SASS generation will almost certainly undo any attempt made to avoid optimization. We'll have to be a little trickier with the device equivalent.
Here's my attempt at such a function:
__device__ always_false{false};
__device__ void * volatile p;
template <typename T>
__device__
void DoNotOptimize(T const& t){
if(always_false){
p = &t;
}
}
Basically, we predicate storing the address of an object on a "runtime" predicate that will always be false.
This should prevent the compiler from discarding the generation of the object t, but there are no guarantees here.