AdaptiveCpp
AdaptiveCpp copied to clipboard
[SYCL2020] Implement sycl::atomic_fence
Hi Aksel,
Less than a year later, I'm trying to bring it full circle here. 🥳 This (properly) closes #767.
Let me know what you think so far. I haven't implemented the SSCP path because I could only find existing implementations for barriers but not memory fences, and so I think I should wait for you to tell me where and how you want this. 🙂
Cheers, -N
Can we implement the old
mem_fence
on top of the newer one?
Done.
For SSCP I would suggest looking at barrier - every barrier is effectively a combination of a memfence and a blocking synchronization primitive. So probably we can factor the
mem_fence
out from there :)
Done.
~Can I get the declaration of __spirv_MemoryBarrier
from spirv_ops.hpp
in fence.cpp
or better not?~ I did not, just in case we want a clear separation.
~As written, the semantics for seq_cst
+ device
for explicit multipass SPIR-V and SSCP SPIR-V is different as the memory semantics flags differ. I suspect the former includes MemorySemanticsMask::WorkgroupMemory
because mem_fence
could (and still can) use fence_space::global_and_local
, but we were using it even with fence_space::global_space
. In the barrier implementation we do not include MemorySemanticsMask::WorkgroupMemory
. What should we do here?~
I've completely rewritten my implementation so it reflects my understanding of the intended semantics.
There are now no differences for SPIR-V.
For HIP-like devices, the SSCP path might be more precise, in the sense that the IR fence primitives we end up calling expose more choices than just __threadfence_block()
and __threadfence()
.
It may well be those IR primitives only implement sequential consistency at the workgroup and device scopes as well - I dunno - but we don't care about that... Indeed, if that were to change, SSCP compilations would automatically be taking advantage.