libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

<cuda::barrier> should add try_wait

Open JAppleyard opened this issue 3 years ago • 0 comments

It can be beneficial to test the state of a barrier prior to needing the barrier to be resolved. This allows one to hide the latency of testing the barrier behind computation. This latency can be quite significant if the barrier is in device or system memory.

For example, instead of: arrive(x) compute1 compute2 wait(x) (usually issues a load which we wait on shortly after, exposing latency)

One could do: arrive(x) compute1 x = try_wait(x) compute2 wait(x)

Where the final wait can test the value of x that was previously loaded overlapped with compute-2.

JAppleyard avatar Mar 05 '21 12:03 JAppleyard