llama.cpp ci: add linux binaries to release build

adds ubuntu20.04 binaries to the releases. and also cublas linux builds.

I changed the path for where the dynamic library is put. It was in the cmake build directory before, now its next to the executables (down into bin/).

I always build shared, with relative rpath (so no export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:. for libllama.so) Distributing the lib makes the life for wrappers (eg python libs) easier.

not in release yet:

avx512 (i can't test this)
openblas (system lib, maybe ship?)
clbast (system lib, maybe ship?)

example release: https://github.com/Green-Sky/llama.cpp/releases/tag/ci_cublas_linux-99b7d15

May 17 '23 16:05 Green-Sky

I think we should just add GNUInstallDirs in the CMakeLists.txt, that way distributors can configure the paths they want to install the files. cmake --install will also do strip and RPATH config, also.

May 18 '23 14:05 SlyEcho

I think we should just add GNUInstallDirs in the CMakeLists.txt, that way distributors can configure the paths they want to install the files. cmake --install will also do strip and RPATH config, also.

will look into that tomorrow

May 18 '23 17:05 Green-Sky

update: The cuda toolkit install now nukes the github action runners. they use too much disk space.

Jun 09 '23 13:06 Green-Sky

Maybe we can keep just one CUDA version?

Jun 09 '23 14:06 SlyEcho

Maybe we can keep just one CUDA version?

I think 1 runner does 1 job at a time. So I don't think that would make a difference. Going to play around with selective installs again <.<

also, I once before saw a gh workflow where some non essentials where deleted, to make some space...

Jun 09 '23 14:06 Green-Sky

OK, managed to download everything and even run main but I get this error:

CUDA error 222 at /home/runner/work/llama.cpp/llama.cpp/ggml-cuda.cu:1244: the provided PTX was compiled with an unsupported toolchain.

This is with release fafc8ae and the CUDA 12 version. The machine also had 12 and 2080 Ti.

Jun 13 '23 14:06 SlyEcho

@SlyEcho btw, I switched to the "networked" installer, which is just setting up an apt repo ... but that works for us.

OK, managed to download everything and even run main but I get this error:
CUDA error 222 at /home/runner/work/llama.cpp/llama.cpp/ggml-cuda.cu:1244: the provided PTX was compiled with an unsupported toolchain.
This is with release fafc8ae and the CUDA 12 version. The machine also had 12 and 2080 Ti.

this looks very weird, no idea what is happening here. since I don't use nvprune, I thought it just works. I can run the cuda11.7 just fine on my system. My driver is too old for 12...

Jun 13 '23 20:06 Green-Sky

Isn’t there an image with CUDA already installed?

I plan to go with that approach for ROCm.

Jul 08 '23 10:07 SlyEcho

Image, hmm. installing cuda now only takes as long as the compile ~1min. so i dont really see the point of using docker (im assuming thats what you mean with image)

Jul 08 '23 10:07 Green-Sky

https://github.com/Jimver/cuda-toolkit/issues/249

this made installing not the full toolkit viable (without me manually installing the apt sources :smile: )

Jul 08 '23 10:07 Green-Sky

Yeah, I meant Docker. AMD publishes their images with everything installed already. Although I don’t know if it’s possible to redistribute some of those runtime components

Jul 08 '23 10:07 SlyEcho

It would be cool for windows build, those take for ever. but for linux builds is now <50% of total build time.

there is still the problem of distributing the binaries NOT every release, those uploads now take up a significant amount of time (comparatively)

Jul 08 '23 10:07 Green-Sky

llama.cpp llama.cpp copied to clipboard

ci: add linux binaries to release build

llama.cpp
llama.cpp copied to clipboard