[BUG]: build and load the fused_optim error : /usr/bin/ld: 找不到 -lcudart: 没有那个文件或目录
🐛 Describe the bug
运行applications/ChatGPT/examples/train_dummy.py 时报错
=========================================================================================
No pre-built kernel is found, build and load the fused_optim kernel during runtime now
=========================================================================================
Detected CUDA files, patching ldflags
Emitting ninja build file /home/verigle/.cache/colossalai/torch_extensions/torch1.13_cu11.7/build.ninja...
Building extension module fused_optim...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ colossal_C_frontend.o multi_tensor_sgd_kernel.cuda.o multi_tensor_scale_kernel.cuda.o multi_tensor_adam.cuda.o multi_tensor_l2norm_kernel.cuda.o multi_tensor_lamb.cuda.o -shared -L/home/verigle/miniconda3/envs/colossalai/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/verigle/miniconda3/envs/colossalai/lib64 -lcudart -o fused_optim.so
FAILED: fused_optim.so
c++ colossal_C_frontend.o multi_tensor_sgd_kernel.cuda.o multi_tensor_scale_kernel.cuda.o multi_tensor_adam.cuda.o multi_tensor_l2norm_kernel.cuda.o multi_tensor_lamb.cuda.o -shared -L/home/verigle/miniconda3/envs/colossalai/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/verigle/miniconda3/envs/colossalai/lib64 -lcudart -o fused_optim.so
/usr/bin/ld: 找不到 -lcudart: 没有那个文件或目录
Environment
conda env : python = 3.8 cuda = 11.7 pytorch = 1.13
with export LD_LIBRARY_PATH=/path/to/your/cuda/lib64:${LD_LIBRARY_PATH} the program still can't find the -lcublas, -lcudart, -lcurand
but with export LIBRARY_PATH=/path/to/your/cuda/lib64:${LIBRARY_PATH} it worked for me.
hignlight the note that using ENV of LIBRARY_PATH rather than LD_LIBRARY_PATH in document is strongly suggested!
Hi @verigle Thanks for your contribution! @FrankLeeeee Can we fix it later? pre-built kernel seems to cause trouble for many users.
with export LD_LIBRARY_PATH=/path/to/your/cuda/lib64:${LD_LIBRARY_PATH} the program still can't find the -lcublas, -lcudart, -lcurand
but with export LIBRARY_PATH=/path/to/your/cuda/lib64:${LIBRARY_PATH} it worked for me.
hignlight the note that using ENV of LIBRARY_PATH rather than LD_LIBRARY_PATH in document is strongly suggested!
Good suggestion, I will add such checks next week. Meanwhile, I am working on improving the kernel build in #2886 .
sudo ln -s /usr/local/cuda/lib64/libcudart.so /usr/lib/libcudart.so
We have updated a lot. This issue was closed due to inactivity. Thanks.