ramalama container-images: add a kompute variant

Add a container variant that enables llama.cpp's kompute backend to enable GPU acceleration using Vulkan.

Tested with krunkit on Apple Silicon with mistral-7b-instruct-v0.2.Q4_0.gguf and Wizard-Vicuna-13B-Uncensored.Q4_0.gguf (>=13B models benefit the most being offloaded to GPU vs. running them on the CPU).

TODO:

[ ] ~~Teach ramalama to choose the best container image for the context.~~
[x] Ensure every operation works transparently when operating on a container.
[ ] ~~Add some Q4_0 models to shortnames.conf~~
[ ] ~~Expose shortnames.conf into the container.~~
[ ] ~~Expose llama.cpp's server port when running in a container.~~

Oct 04 '24 15:10 slp

Would it make more sense to layer this on the original image?

from quay.io/ramalama/ramalama:latest ...

Oct 04 '24 15:10 rhatdan

That's not a bad idea @rhatdan, we can just replace the CPU-only binaries with these kompute ones, less duplication.

We do have an issue though, been trying to make these ubi9-based as Scott McCarthy requested, I think that makes sense, but UBI images are missing access to a small set of required packages, we can make it work by pulling the CentOS Stream ones via something like:

FROM quay.io/ramalama/ramalama:latest

ARG LLAMA_CPP_SHA=2a24c8caa6d10a7263ca317fa7cb64f0edc72aae
# renovate: datasource=git-refs depName=ggerganov/whisper.cpp packageName=https://github.com/ggerganov/whisper.cpp gitRef=master versioning=loose type=digest
ARG WHISPER_CPP_SHA=5caa19240d55bfd6ee316d50fbad32c6e9c39528

RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official

RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
      glslang && \
    dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
    dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
    dnf clean all && \
    rm -rf /var/cache/*dnf*

ENV GGML_CCACHE=0

RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
    cd llama.cpp && \
    git reset --hard ${LLAMA_CPP_SHA} && \
    cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
    cmake --build build --config Release -j $(nproc) && \
    cmake --install build && \
    cd / && \
    rm -rf llama.cpp

RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
    cd whisper.cpp && \
    git reset --hard ${WHISPER_CPP_SHA} && \
    make -j $(nproc) && \
    mv main /usr/bin/whisper-main && \
    mv server /usr/bin/whisper-server && \
    cd / && \
    rm -rf whisper.cpp

But then it becomes kinda a hybrid UBI9/Stream image.

Oct 04 '24 16:10 ericcurtin

Could you turn on x86_64 EPEL9 builds for this also @slp ? :

https://copr.fedorainfracloud.org/coprs/slp/mesa-krunkit/

I wanna try that out with an x86_64 GPU 😄

Oct 04 '24 16:10 ericcurtin

I think EPEL would be a better solution then centos-stream if they are all available.

Oct 04 '24 16:10 rhatdan

Then aren't in EPEL @rhatdan :'( They are in AppStream/BaseOS but not UBI

Oct 04 '24 17:10 ericcurtin

UBI repos seem to be a subset of RHEL versions of the repos:

https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi9/9/x86_64/appstream/os/Packages/ https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi9/9/x86_64/baseos/os/Packages/

not sure exactly what defines a package being RHEL-only or RHEL+UBI accessible

Oct 04 '24 17:10 ericcurtin

I'm afraid the situation the Vulkan-related packages in CentOS Stream 9 is kind of broken (i.e. spirv-headers-devel is nowhere to be found, even though some packages that depend on it are present in the repos). I can't even rebuild the same Mesa spec in my COPR.

I think F40 is the only option for now. We can easily switch to Stream 9 once the Vulkan situation is fixed, and eventually to ubi9.

Oct 04 '24 17:10 slp

@slp that package in particular looks like it's in EPEL9

Oct 04 '24 21:10 ericcurtin

This seemed to work reasonably ok:

FROM quay.io/ramalama/ramalama:latest

ARG LLAMA_CPP_SHA=2a24c8caa6d10a7263ca317fa7cb64f0edc72aae
# renovate: datasource=git-refs depName=ggerganov/whisper.cpp packageName=https://github.com/ggerganov/whisper.cpp gitRef=master versioning=loose type=digest
ARG WHISPER_CPP_SHA=5caa19240d55bfd6ee316d50fbad32c6e9c39528

RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official

RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
      glslang epel-release && \
    dnf install -y spirv-headers-devel && \
    dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
    dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
    dnf clean all && \
    rm -rf /var/cache/*dnf*

ENV GGML_CCACHE=0

RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
    cd llama.cpp && \
    git reset --hard ${LLAMA_CPP_SHA} && \
    cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
    cmake --build build --config Release -j $(nproc) && \
    cmake --install build && \
    cd / && \
    rm -rf llama.cpp

RUN git clone https://github.com/ggerganov/whisper.cpp.git && \
    cd whisper.cpp && \
    git reset --hard ${WHISPER_CPP_SHA} && \
    make -j $(nproc) && \
    mv main /usr/bin/whisper-main && \
    mv server /usr/bin/whisper-server && \
    cd / && \
    rm -rf whisper.cpp

Oct 05 '24 15:10 ericcurtin

Why are you rebuilding the whisper-server? Isn't the one in the parent layer the same?

Or do you have to compile it differently for kompute?

Oct 05 '24 16:10 rhatdan

Who is using the dnf install -y spirv-headers-devel package?

Should this be just used during build and removed before finished?

Oct 05 '24 16:10 rhatdan

So it would be more like this:

FROM quay.io/ramalama/ramalama:latest

RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/AppStream/$(uname -m)/os/
RUN dnf config-manager --add-repo https://mirror.stream.centos.org/9-stream/BaseOS/$(uname -m)/os/
RUN curl -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-Official
RUN rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official

RUN dnf install -y vulkan-headers vulkan-loader-devel vulkan-tools glslc \
      glslang epel-release && \
    dnf install -y spirv-headers-devel && \
    dnf copr enable -y slp/mesa-krunkit epel-9-$(uname -m) && \
    dnf install -y mesa-vulkan-drivers-24.1.5-101 && \
    dnf clean all && \
    rm -rf /var/cache/*dnf*

RUN git clone --recurse-submodules https://github.com/ggerganov/llama.cpp && \
    cd llama.cpp && \
    git reset --hard ${LLAMA_CPP_SHA} && \
    cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_KOMPUTE=1 -DGGML_CCACHE=0 && \
    cmake --build build --config Release -j $(nproc) && \
    cmake --install build && \
    cd / && \
    rm -rf llama.cpp

But I guess there may be other reasons @slp cannot build the latest variant for EPEL9 I guess... I notice only an older version is build for EPEL9 aarch64...

If RHEL9/RHEL10 won't be suitable for this kind of thing, it would be nice to document somewhere was packages are missing in both RHEL9 and RHEL10 and if they are in EPEL, etc.

Oct 06 '24 12:10 ericcurtin

Do RHEL/RHEL AI customers have access to full-RHEL containers? Not just the subset of packages in UBI, this is also something I am curious about...

Oct 06 '24 12:10 ericcurtin

Yes they have full access to RHEL content.

Oct 07 '24 12:10 rhatdan

v2:

Extend the existing Containerfile instead of building a new one.
Use ubi9 instead of Fedora 40.
Add a cli option to request GPU offloading.

Oct 09 '24 14:10 slp

I'm curious if we need the llama.cpp downgrade and would be nice to see the total size difference between cpuonly and a cpuonly+kompute container image

Sadly, we do need the downgrade because the kompute backend it's broken in master. This happens frequently with pretty much every backend except cpu. I want to take a look to see if I can fix it myself in master, but can't make any promises.

As for the size, this is what I'm getting here:

localhost/ramalama-cpuonly           latest               56a6236affe0  About a minute ago  660 MB
localhost/ramalama-kompute           latest               c312022f66a5  About an hour ago   862 MB

Oct 09 '24 14:10 slp

Should I add a commit renaming the directory? Feels weird keeping the name cpuonly

Oct 09 '24 14:10 slp

Yes rename, I would prefer to keep one image for CPU and Kompute with the limited size difference.

Oct 09 '24 14:10 rhatdan

cpuonly + kompute (merged image) = generic

Just a suggestion...

Oct 09 '24 15:10 ericcurtin

I would rather name it ramalama or vulcan. If ramalama then others can build their Vendor specific images from it.

Oct 09 '24 15:10 rhatdan