ucx
ucx copied to clipboard
UCT/CUDA_IPC: Implemented uct_cuda_ipc_rkey_ptr
What
Implements the function uct_cuda_ipc_rkey_ptr.
Why ?
So that we can call ucp_rkey_ptr on CUDA memory.
This would allow us to write to a remote processes GPU's memory within a GPU kernel.
__global__ void device_function(double *addr_p, double *A, double *B, double *C)
{
int i = get_thread_ID();
// Do work
addr_p[i] = A[i] * B[i] + C[i];
}
__host__ void host_function(...)
{
// Do work
status = ucp_rkey_ptr(rkey, raddr, &addr_p);
// Do work
device_function<<<...>>>(addr_p, A, B, C);
}
@Akshay-Venkatesh @bureddy
I was wondering if this addition to the cuda_ipc module could be up for discussion?