BabelStream
BabelStream copied to clipboard
std-indices: Capture in offload-friendly way
Previously, std-indices captured by reference. In an offload scenario, capture-by-value is generally preferred because if the reference points to stack memory on the host, offloaded kernels will encounter illegal memory accesses. This is also something that compilers generally cannot remedy using magic compiler transformations to make data GPU-accessible -- this only works for the heap.
The current code only works in an offload scenario, because the stream class itself is allocated on the heap (which compilers can then make GPU-accessible), and the kernels can then reference the std::vector
objects for the data.
As I've said, this relies on an implementation detail and may be brittle in case the architecture ever changes, or someone wishes to reuse babelstream code in a different context.
This PR therefore attempts to make things more robust by directly capturing data pointers by value.
On Intel iGPU, I see no substantial performance difference between the two versions in an offload scenario.
@illuhad Can you check if this is resolved in develop?
@tom91136 i think this is indeed resolved in develop. Now a
, b
, and c
are pointers, and are captured by value, avoiding the capture of the std::vector
object by reference or a copy of the std::vector
object itself to be captured.