AMDGPU.jl
AMDGPU.jl copied to clipboard
[Mark/Wait] Use HIP events to do fine-grained sync
We shouldn't need to wait on the whole stream to finish, just the portion of it that contains our launched kernels.