alpaka
alpaka copied to clipboard
atomic_ref based atomics are too strong
The CPU atomic implementation using std::atomic_ref
use a sequentially consistent memory ordering, which is a stronger guarantee than their CUDA counterparts, which are weakly ordered and always require explicit fences. Therefore, the CPU atomics should also be weakened to a relaxed memory order, potentially improving performance on CPUs.
I think on x86 they are the same, IIRC it is only ARM and Power that have weaker atomics.