llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Compile bug: compiling llama.cpp against CUDA 12 succeeds but resulting binaries cannot find shared libraries

Open compilebunny opened this issue 11 months ago • 3 comments

Git commit

git rev-parse HEAD 9fbadaef4f0903c64895ba9c70f02ac6e6a4b41c

OS: Ubuntu 24 LTS

Compiling llama.cpp with CUDA using

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/bin/nvcc -DGGML_CCACHE=OFF
cmake --build build --config Release

succeeds, but executing llama-server produces:

llama-server: error while loading shared libraries: libcublas.so.11: cannot open shared object file: No such file or directory

Operating systems

Linux

GGML backends

CUDA

Problem description & steps to reproduce

Compiling llama.cpp with CUDA using

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/bin/nvcc -DGGML_CCACHE=OFF
cmake --build build --config Release

First Bad Commit

Unknown

Compile command

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/bin/nvcc -DGGML_CCACHE=OFF
cmake --build build --config Release

Relevant log output

llama-server: error while loading shared libraries: libcublas.so.11: cannot open shared object file: No such file or directory

compilebunny avatar Jan 25 '25 01:01 compilebunny

Did you move the file that was compiled by any chance? Or did you run the binary from the folder it was created in?

Mushoz avatar Jan 25 '25 21:01 Mushoz

One way to check how your system is resolving this shared library is to first look at the llama-server binary and see its needed dependencies:

$readelf -d build/bin/llama-server 

Dynamic section at offset 0x3e2e20 contains 38 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libcurl.so.4]
 0x0000000000000001 (NEEDED)             Shared library: [libllama.so]
 0x0000000000000001 (NEEDED)             Shared library: [libggml.so]
 0x0000000000000001 (NEEDED)             Shared library: [libggml-cpu.so]
 0x0000000000000001 (NEEDED)             Shared library: [libggml-cuda.so]
 0x0000000000000001 (NEEDED)             Shared library: [libggml-base.so]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
 0x000000000000001d (RUNPATH)            Library runpath: [/home/danbev/work/ai/llama.cpp/build/bin:]
...

This shows that libggml-cuda.so is a needed dependency. And we can then inspect it:

$ readelf -d /home/danbev/work/ai/llama.cpp/build/bin/libggml-cuda.so

Dynamic section at offset 0x3231bd0 contains 33 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libggml-base.so]
 0x0000000000000001 (NEEDED)             Shared library: [libcudart.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libcublas.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libcuda.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libggml-cuda.so]
 0x000000000000001d (RUNPATH)            Library runpath: [/home/danbev/work/ai/llama.cpp/build/bin:]

And here we can see that it needs libcublas.so.12. We can inspect how the dynamic loader resolves this using the following command:

$ LD_DEBUG=libs,search ldd build/bin/libggml-cuda.so
...
    166419:	find library=libcublas.so.12 [0]; searching
    166419:	 search path=/home/danbev/work/ai/llama.cpp/build/bin:glibc-hwcaps/x86-64-v3:glibc-hwcaps/x86-64-v2:		(RUNPATH from file build/bin/libggml-cuda.so)
    166419:	  trying file=/home/danbev/work/ai/llama.cpp/build/bin/libcublas.so.12
    166419:	  trying file=glibc-hwcaps/x86-64-v3/libcublas.so.12
    166419:	  trying file=glibc-hwcaps/x86-64-v2/libcublas.so.12
    166419:	  trying file=libcublas.so.12
    166419:	 search cache=/etc/ld.so.cache
    166419:	  trying file=/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12
...

Notice that in this case the path to libcublas.so.12 is taken from the /etc/ld.so.cache which was build with ldconfig. We can use the following command to see which cached paths are currently being searched:

$ ldconfig -v                                                                      

On my system etc/ld.so.conf contains the following:

$ cat /etc/ld.so.conf                                                           
include /etc/ld.so.conf.d/*.conf                                                

So to add a path to the ld cache we can add a file to the /etc/ld.so.conf.d/ directory.

You can also control the lookup of the shared libraries by setting the LD_LIBRARY_PATH environment variable.

Hopefully running the above commands on your system can shed some light on which paths your system is using for libcublas.so.

danbev avatar Jan 26 '25 06:01 danbev

Downloaded

git rev-parse HEAD f35726c2fb0a824246e004ab4bedcde37f3f0dd0

and it works now.

compilebunny avatar Jan 26 '25 13:01 compilebunny

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Mar 12 '25 01:03 github-actions[bot]