clad Call to `clad::gradient` for CUDA kernels cannot be compiled for GPU

Call to `clad::gradient` for CUDA kernels cannot be compiled for GPU

Open kchristin22 opened this issue 6 months ago • 4 comments

Since global kernels cannot be called like normal device functions from other device functions, the following command can be compiled only for host like so:

#ifndef __CUDA_ARCH__
    auto kernel_g = clad::gradient(kernel);
#endif

When executing the kernel though, the device needs to recognize(="see") the kernel. Hence, when trying to execute it, the device is not able to read the kernel (it says: "device symbol not found").

A simple workaround is generating a header file with a fixed name in the build folder(or in inc) like the compiler option -fgenerate-source-file does, placing the derived kernel there and including the file in Differentiator.h. The user would call the gradient with the macros like above.

There are some other ways that could also work, like loading the kernel to GPU manually using CUDA Modules. However, this may require assemblied code in string and not the c++ code.

Also, transferring (allocating) a function pointer from host to device using cudaMemcyToSymbol does not work for kernels.

Maybe there could be a gradient_kernel function that is also a kernel and the returned object is an argument of that function (as kernels are void functions), but it is speculation that it could work and there may be problem with execute down the line.

Aug 08 '24 18:08 kchristin22

clad clad copied to clipboard

Call to `clad::gradient` for CUDA kernels cannot be compiled for GPU

clad
clad copied to clipboard