container-images: add a kompute variant
Add a container variant that enables llama.cpp's kompute backend to enable GPU acceleration using Vulkan.
Tested with krunkit on Apple Silicon with mistral-7b-instruct-v0.2.Q4_0.gguf and Wizard-Vicuna-13B-Uncensored.Q4_0.gguf (>=13B models benefit the most being offloaded to GPU vs. running them on the CPU).
TODO:
- [ ] ~~Teach ramalama to choose the best container image for the context.~~
- [x] Ensure every operation works transparently when operating on a container.
- [ ] ~~Add some Q4_0 models to shortnames.conf~~
- [ ] ~~Expose shortnames.conf into the container.~~
- [ ] ~~Expose llama.cpp's server port when running in a container.~~
Would it make more sense to layer this on the original image?
from quay.io/ramalama/ramalama:latest ...
That's not a bad idea @rhatdan, we can just replace the CPU-only binaries with these kompute ones, less duplication.
We do have an issue though, been trying to make these ubi9-based as Scott McCarthy requested, I think that makes sense, but UBI images are missing access to a small set of required packages, we can make it work by pulling the CentOS Stream ones via something like:
FROM quay.io/ramalama/ramalama:latest
ARG LLAMA_CPP_SHA=2a24c8caa6d10a7263ca317fa7cb64f0edc72aae
# renovate: datasource=git-refs depName=ggerganov/whisper.cpp packageName=https://github.com/ggerganov/whisper.cpp gitRef=master versioning=loose type=digest
ARG WHISPER_CPP_SHA=5caa19240d55bfd6ee316d50fbad32c6e9c39528
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official
RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
glslang && \
dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
dnf clean all && \
rm -rf /var/cache/*dnf*
ENV GGML_CCACHE=0
RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
cd llama.cpp && \
git reset --hard ${LLAMA_CPP_SHA} && \
cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
cmake --build build --config Release -j $(nproc) && \
cmake --install build && \
cd / && \
rm -rf llama.cpp
RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
cd whisper.cpp && \
git reset --hard ${WHISPER_CPP_SHA} && \
make -j $(nproc) && \
mv main /usr/bin/whisper-main && \
mv server /usr/bin/whisper-server && \
cd / && \
rm -rf whisper.cpp
But then it becomes kinda a hybrid UBI9/Stream image.
Could you turn on x86_64 EPEL9 builds for this also @slp ? :
https://copr.fedorainfracloud.org/coprs/slp/mesa-krunkit/
I wanna try that out with an x86_64 GPU 😄
I think EPEL would be a better solution then centos-stream if they are all available.
Then aren't in EPEL @rhatdan :'( They are in AppStream/BaseOS but not UBI
UBI repos seem to be a subset of RHEL versions of the repos:
https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi9/9/x86_64/appstream/os/Packages/ https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi9/9/x86_64/baseos/os/Packages/
not sure exactly what defines a package being RHEL-only or RHEL+UBI accessible
I'm afraid the situation the Vulkan-related packages in CentOS Stream 9 is kind of broken (i.e. spirv-headers-devel is nowhere to be found, even though some packages that depend on it are present in the repos). I can't even rebuild the same Mesa spec in my COPR.
I think F40 is the only option for now. We can easily switch to Stream 9 once the Vulkan situation is fixed, and eventually to ubi9.
@slp that package in particular looks like it's in EPEL9
This seemed to work reasonably ok:
FROM quay.io/ramalama/ramalama:latest
ARG LLAMA_CPP_SHA=2a24c8caa6d10a7263ca317fa7cb64f0edc72aae
# renovate: datasource=git-refs depName=ggerganov/whisper.cpp packageName=https://github.com/ggerganov/whisper.cpp gitRef=master versioning=loose type=digest
ARG WHISPER_CPP_SHA=5caa19240d55bfd6ee316d50fbad32c6e9c39528
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official
RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
glslang epel-release && \
dnf install -y spirv-headers-devel && \
dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
dnf clean all && \
rm -rf /var/cache/*dnf*
ENV GGML_CCACHE=0
RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
cd llama.cpp && \
git reset --hard ${LLAMA_CPP_SHA} && \
cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
cmake --build build --config Release -j $(nproc) && \
cmake --install build && \
cd / && \
rm -rf llama.cpp
RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
cd whisper.cpp && \
git reset --hard ${WHISPER_CPP_SHA} && \
make -j $(nproc) && \
mv main /usr/bin/whisper-main && \
mv server /usr/bin/whisper-server && \
cd / && \
rm -rf whisper.cpp
Why are you rebuilding the whisper-server? Isn't the one in the parent layer the same?
Or do you have to compile it differently for kompute?
Who is using the dnf install -y spirv-headers-devel package?
Should this be just used during build and removed before finished?
So it would be more like this:
FROM quay.io/ramalama/ramalama:latest
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official
RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
glslang epel-release && \
dnf install -y spirv-headers-devel && \
dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
dnf clean all && \
rm -rf /var/cache/*dnf*
RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
cd llama.cpp && \
git reset --hard ${LLAMA_CPP_SHA} && \
cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
cmake --build build --config Release -j $(nproc) && \
cmake --install build && \
cd / && \
rm -rf llama.cpp
But I guess there may be other reasons @slp cannot build the latest variant for EPEL9 I guess... I notice only an older version is build for EPEL9 aarch64...
If RHEL9/RHEL10 won't be suitable for this kind of thing, it would be nice to document somewhere was packages are missing in both RHEL9 and RHEL10 and if they are in EPEL, etc.
Do RHEL/RHEL AI customers have access to full-RHEL containers? Not just the subset of packages in UBI, this is also something I am curious about...
Yes they have full access to RHEL content.
v2:
- Extend the existing Containerfile instead of building a new one.
- Use ubi9 instead of Fedora 40.
- Add a cli option to request GPU offloading.
I'm curious if we need the llama.cpp downgrade and would be nice to see the total size difference between cpuonly and a cpuonly+kompute container image
Sadly, we do need the downgrade because the kompute backend it's broken in master. This happens frequently with pretty much every backend except cpu. I want to take a look to see if I can fix it myself in master, but can't make any promises.
As for the size, this is what I'm getting here:
localhost/ramalama-cpuonly latest 56a6236affe0 About a minute ago 660 MB
localhost/ramalama-kompute latest c312022f66a5 About an hour ago 862 MB
Should I add a commit renaming the directory? Feels weird keeping the name cpuonly
Yes rename, I would prefer to keep one image for CPU and Kompute with the limited size difference.
cpuonly + kompute (merged image) = generic
Just a suggestion...
I would rather name it ramalama or vulcan. If ramalama then others can build their Vendor specific images from it.