Compiling and linking cublas_v2: cuModuleLoadDataEx can't find named symbol
Since kernels can now call other kernels, I wanted to try and call cublas_v2 routines from inside my kernels. I can't figure out though how I get this to compile.
Using this SourceModule definition:
kernel_mod = SourceModule("""
#include <cublas_v2.h>
__global__ void deviceReduceCublas(double *in, double *out, int N) {
cublasHandle_t cnpHandle;
cublasStatus_t status = cublasCreate(&cnpHandle);
status = cublasDasum(cnpHandle, N, in, 1, out);
cublasDestroy(cnpHandle);
}
""",
options=['-lcublas', '-lcublas_device', '-lcudadevrt'],
include_dirs=['/opt/cuda/include', '/opt/cuda/lib64'])
I get the error:
Traceback (most recent call last):
File "chaotic_neural.py", line 129, in <module>
include_dirs=['/opt/cuda/include', '/opt/cuda/lib64'])
File "/usr/lib/python3.6/site-packages/pycuda/compiler.py", line 265, in __init__
arch, code, cache_dir, include_dirs)
File "/usr/lib/python3.6/site-packages/pycuda/compiler.py", line 255, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/usr/lib/python3.6/site-packages/pycuda/compiler.py", line 137, in compile_plain
stderr=stderr.decode("utf-8", "replace"))
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp9po2la3f/kernel.cu failed
[command: nvcc --cubin -lcublas -lcublas_device -lcudadevrt -arch sm_50 -I/opt/cuda/include -I/opt/cuda/lib64 -I/usr/lib/python3.6/site-packages/pycuda/cuda kernel.cu]
[stderr:
ptxas fatal : Unresolved extern function 'cublasCreate_v2'
]
Any ideas how to get that working?
There were 2 problems with the above: 1) no linking to cublas was done; adding -dlink fixes that. 2) after doing that, a lot of conflicting declaration of C function errors popped up, which are fixed by moving the include declarations outside of the extern "C" block and writing it yourself:
kernel_mod = SourceModule("""
#include "cublas_v2.h"
extern "C" {
__global__ void deviceReduceCublas(double *in, double *out, int N) {
cublasHandle_t cnpHandle;
cublasStatus_t status = cublasCreate(&cnpHandle);
status = cublasDasum(cnpHandle, N, in, 1, out);
cublasDestroy(cnpHandle);
}
}
""",
options=['-lcublas', '-lcublas_device', '-lcudadevrt','-dlink'], no_xtern_c=True)
However, now a new problem appeared; is this related to name mangling?
% optirun python chaotic_neural.py
Traceback (most recent call last):
File "chaotic_neural.py", line 130, in <module>
options=['-lcublas', '-lcublas_device', '-lcudadevrt', '-dlink'], no_extern_c=True, keep=True)
File "/usr/lib/python3.6/site-packages/pycuda/compiler.py", line 268, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: named symbol not found -
It further turns out that one can compile the above kernel into cubin file by hand using nvcc and then load the module into pycuda. I wonder why it can't do that by itself?
The following works nicely:
cuda_mod = cuda.module_from_file("kernel2.cubin")
reduce_fun = cuda_mod.get_function("deviceReduceCublas")