cccl `cudax::stream::wait()` blocks the caller, while `cudax::stream::wait(event

cudax::stream A, B;

auto e = A.record_event();

A.wait();
// Block the calling host thread.
// Equivalent to: `cudaStreamSynchronize`.  

B.wait(e);
// Does NOT block the calling host thread.
// Equivalent to: `cudaStreamWaitEvent`.

cudax::stream::wait has three overloads:

wait(), which blocks the caller until the stream's work is complete.
wait(event_ref), which makes the stream wait for the event before completing any other work. This does not block.
wait(stream_ref), which makes the stream wait for the stream before completing any other work. This does not block.

All the functions in well-designed function overload set should:

Perform the same fundamental operation, perhaps with different configurations, kinds of inputs, and
Provide the same contract.
- Identical parameters should have the same constness and mutability.
- Any semantics that can have subtle and hazardous implications should be identical: allocation, algorithmic complexity, and synchronization.

The wait(event_ref) and wait(stream_ref) functions need a different name. In Thrust's unique_stream I believe I called these depend_on.

Mar 13 '25 19:03 brycelelbach

This is really a bad name for people who have some basic knowledge in CUDA:

A.wait();

Can we just call it A.synchronize(); or A.sync(); (matching cuda.core)?

Mar 13 '25 21:03 leofang

I think we should avoid using the word wait for a function that does not block the caller. Every synchronization primitive in the standard library that has a wait method blocks the caller.

https://duckduckgo.com/?sites=cppreference.com&q=wait&ia=web

The one that blocks should be the one called wait. The other one should have a different name.

Apr 16 '25 15:04 brycelelbach

We ended up going the other way than described in this issue and renamed argument less wait() to sync() as Leo was suggesting above. We decided that the established CUDA naming is more important than the C++ naming in this case. But the core issue of two APIs with the same name having very different semantics was resolved.

https://github.com/NVIDIA/cccl/pull/4379

Oct 10 '25 01:10 pciolkosz

`cudax::stream::wait()` blocks the caller, while `cudax::stream::wait(event_ref)` does not