pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

cudaLaunchCooperativeKernel support

Open AlexanderZvyagin opened this issue 4 years ago • 2 comments

Is it difficult to add cudaLaunchCooperativeKernel() call?

Consider the kernel:

#include <cooperative_groups.h>
namespace cg = cooperative_groups;
extern "C" __global__ void kernel (float *buf) {
    cg::grid_group grid = cg::this_grid();
    assert(grid.is_valid());
    grid.sync();
}

The kernel will assert with kernel <<< dim1,dim2 >>> (buf_gpu) syntax. Instead, cudaLaunchCooperativeKernel must be used:

float *buf_gpu = .....;
void *kernel_args[] = {&buf_gpu};
cudaLaunchCooperativeKernel((void*)kernel, dim1, dim2, kernel_args);

It does not look very complicated. Why it is not yet implemented?

AlexanderZvyagin avatar Nov 14 '20 14:11 AlexanderZvyagin

I agree that it wouldn't be very complicated. You would need to use the driver API functions. I'd be happy to review a PR.

inducer avatar Nov 15 '20 03:11 inducer