[FEA]: Prototype a design to ensure asynchronous operations on different streams work nicely with `cuda::launch`

Open miscco opened this issue 1 year ago • 0 comments

Asynchronous operations are notoriously difficult.

We need to ensure that any memory allocation that we got from cudaMallocAsync is actually valid when we are trying to access it.

Furthermore, we could have different streams interacting with a kernel, so we need to come up with a design that allows cuda::launch to synchronize when neccessary

### Tasks
- [ ] https://github.com/NVIDIA/cccl/issues/2206
- [ ] [FEA]: Implement a design that allows `cuda::launch` to optionally synchronize with other streams

Jul 31 '24 16:07 miscco