cuda_hook icon indicating copy to clipboard operation
cuda_hook copied to clipboard

BUG: `dlopen("/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so", RTLD_NOW | RTLD_LOCAL)` failed, symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

Open githubcq opened this issue 2 months ago • 0 comments

When execute tensorflow minist train task, occur the problem, 'Check failed: cublas_handle'.

It caused by dlopen, the complete command is dlopen("/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so", RTLD_NOW | RTLD_LOCAL). And error throwed by dlopen is 'Failed to open /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so: /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference'.

According to ldd and nm, libcublas.so depend on libcublasLt.so.11, which linked to '/home/chenqian/Code/cuda_hook/output/lib64/libcublasLt.so.11'. And, there is no symbol free_gemm_select in both '/home/chenqian/Code/cuda_hook/output/lib64/libcublasLt.so.11' and '/usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11'.

Moreover, if without cuda hook, the train task can complete.

Screenshot 2024-04-23 at 2 28 40 PM Screenshot 2024-04-23 at 1 58 17 PM Screenshot 2024-04-23 at 1 58 54 PM Screenshot 2024-04-23 at 2 02 45 PM

githubcq avatar Apr 23 '24 06:04 githubcq