pycuda
pycuda copied to clipboard
cudaLaunchCooperativeKernel support
Is it difficult to add cudaLaunchCooperativeKernel() call?
Consider the kernel:
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
extern "C" __global__ void kernel (float *buf) {
cg::grid_group grid = cg::this_grid();
assert(grid.is_valid());
grid.sync();
}
The kernel will assert with kernel <<< dim1,dim2 >>> (buf_gpu)
syntax. Instead, cudaLaunchCooperativeKernel must be used:
float *buf_gpu = .....;
void *kernel_args[] = {&buf_gpu};
cudaLaunchCooperativeKernel((void*)kernel, dim1, dim2, kernel_args);
It does not look very complicated. Why it is not yet implemented?
I agree that it wouldn't be very complicated. You would need to use the driver API functions. I'd be happy to review a PR.