clad
clad copied to clipboard
Call to `clad::gradient` for CUDA kernels cannot be compiled for GPU
Since global kernels cannot be called like normal device functions from other device functions, the following command can be compiled only for host like so:
#ifndef __CUDA_ARCH__
auto kernel_g = clad::gradient(kernel);
#endif
When executing the kernel though, the device needs to recognize(="see") the kernel. Hence, when trying to execute it, the device is not able to read the kernel (it says: "device symbol not found").
A simple workaround is generating a header file with a fixed name in the build folder(or in inc) like the compiler option -fgenerate-source-file
does, placing the derived kernel there and including the file in Differentiator.h
. The user would call the gradient with the macros like above.
There are some other ways that could also work, like loading the kernel to GPU manually using CUDA Modules. However, this may require assemblied code in string and not the c++ code.
Also, transferring (allocating) a function pointer from host to device using cudaMemcyToSymbol
does not work for kernels.
Maybe there could be a gradient_kernel
function that is also a kernel and the returned object is an argument of that function (as kernels are void functions), but it is speculation that it could work and there may be problem with execute down the line.