serge
serge copied to clipboard
Use llama precompiled source
Description
This change just modifies the Docker build for the API to use the precompiled llama Docker image as opposed to compiling from source every build.
Changes
- Alter API Docker build to remove the need to compile llama from source
Issue
https://github.com/nsarrazin/serge/issues/48
Hey apologies the Dockerfile.api
is deprecated and is no longer used. The Dockerfile
file at the root of the project is the one that is being used. But the same code should work there too! If you feel like moving it to the other file then I'll have a look, sounds like a great fix to solve a lot of headaches haha.
I'll also remove the deprecated dockerfiles, sorry again for forgetting to remove them.
@nsarrazin no worries. I believe this should be addressed now. Let me know if you see any issues.
Trying to run it from within Serge I get a server error.
When running the llama
executable directly this is what I get.
root@74571ae41425:/usr/src/app# llama -m weights/ggml-alpaca-7B-q4_0.bin
main: seed = 1679773025
llama_model_load: loading model from 'weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
Illegal instruction (core dumped)
Is this something you know about ?
I'm running this on:
- CPU: Ryzen 5 5600X (avx & avx2 compatible)
- OS: Pop Os 22.04, kernel 6.2.0-76060200-generic
- Docker Compose version v2.16.0
Interesting. I wasn't seeing that before, but I just did a complete rebuild on the image and am seeing it now.
I'll have to do some digging. No ideas offhand as to why this might be happening.
I'm also trying to run the docker command directly from the README of llama.cpp with
docker run -v .:/models ghcr.io/ggerganov/llama.cpp:light -m models/ggml-alpaca-7B-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
and no success so far there either.
So I'm not sure if it's a bug related to serge or the docker image itself
I've tried a few different versions of the Docker copied binary in Serge now and all of them are exhibiting the same behavior.
I'm hesitant to say that llama.cpp Docker image has a problem. If you look through this file, it more or less does exactly what your source build does https://github.com/ggerganov/llama.cpp/blob/master/.devops/main.Dockerfile
It looks like you have the git pull for the llama source pinned to a specific commit. Let me see if I can find a release for that commit to test in Docker.
Looks like the Docker image tied with the commit pinned in the current Serge Dockerfile is still exhibiting the same behavior. Still trying to wrap my head around how the Docker image binary could possibly be different from the one being compiled from source in Serge if they execute the same steps.
Have you by chance tested the current version of Serge without this change to see if this issue is present? I would test myself, but I'm having trouble with the source compilation step on my machine.
Alright current findings:
-
building llama in serge using the current dockerfile: works fine I get an output
-
building llama in serge, updating the branch used to
master-c2b25b6
(latest as of testing) works fine too, no issues -
replacing the build step with
FROM ghcr.io/ggerganov/llama.cpp:light as llama_builder
, I get the errorIllegal instruction (core dumped)
-
running the llama.cpp image directly, bypassing serge completely with
docker run -v .:/models ghcr.io/ggerganov/llama.cpp:light -m models/ggml-alpaca-7B-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
I also get an error.
might be worth trying with a new branch in the serge dockerfile like master-c2b25b6
if it improves your compilation errors.
What happens if you run ghcr.io/ggerganov/llama.cpp:light
alone?
What happens if you run
ghcr.io/ggerganov/llama.cpp:light
alone?
That's what I did with the last step in my previous message, the command is taken straight from the readme, except the bind mount which I changed.
Is it working for you ?
I downloaded the model, but it complains that it's too old... Which one are you using?
This is the one I used: https://huggingface.co/Pi3141/alpaca-7B-ggml/tree/main
It's too old you gotta convert it. Serge does it for you automatically.
The script is here: https://github.com/nsarrazin/serge/blob/main/api/utils/convert.py
Just found this issue: https://github.com/ggerganov/llama.cpp/issues/402
Seems pretty relevant here based on the behavior.
Same as before, I have no idea how this correlates to using a precompiled binary vs compiling it from scratch though.
@mcsgroi So it seems the image gets build on every commit to master, maybe there was an issue. Try again with the latest one.
That does appear to be the case. I've tried a few images all the way back to the commit that was pinned in this original build process. All seemed to yield the same error result.
However, I just pulled the latest image from a few minutes ago and, wouldn't you know it, it worked. I'll get this updated.
Updated. Would appreciate if someone would give this a run on their machine to sanity check this version works as expected.
Updated. Would appreciate if someone would give this a run on their machine to sanity check this version works as expected.
On it!
Ha incredible, it seems to work now ? Let's make sure the version is frozen and let's just use this one for now :laughing:
I asked the folks on discord to have a review if possible. So I'll give them some time, but I probably will merge soon. Works great for me! Thanks, I think it's a real upgrade.
Someone with a Mac M1 got the following issue:
[+] Building 0.4s (4/4) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 32B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:22.04 0.3s
=> ERROR [internal] load metadata for ghcr.io/ggerganov/llama.cpp:light-19726169b379bebc96189673a19b89ab1d307659 0.3s
------
> [internal] load metadata for ghcr.io/ggerganov/llama.cpp:light-19726169b379bebc96189673a19b89ab1d307659:
------
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: no match for platform in manifest sha256:cf18d117cbf29013c5484d87e27c5dd478bc69cd448ef8b937d2847a7c1f8b81: not found
Maybe we could try to use a fallback somehow ? If it fails to find an image, compile manually.
Heya! The website is spinning up but when trying to chat, I get this on two different machines with 16 GB+ of RAM.
@nsarrazin I'm not sure I have a good answer for your request. I'd have to do some research, but I don't know of a simple way to conditionally import different Docker images. I noticed that @gaby introduced a related change for this.
If there's something specific you had in mind and a change is still required, let me know.
@willjasen that looks like the issue I was seeing before I updated the image referenced in this PR (commit: 530f2b6). Have you tried rebuilding since that update docker build up -d --build
?
@willjasen that looks like the issue I was seeing before I updated the image referenced in this PR (commit: 530f2b6). Have you tried rebuilding since that update
docker build up -d --build
?
I tried "docker build up -d --build" and it's the same thing.
Heya! The website is spinning up but when trying to chat, I get this on two different machines with 16 GB+ of RAM.
Hey! Can you tell me more about your hardware ? OS/CPU etc.
And also if you could run docker compose exec serge llama -m weights/your_model_goes_here.bin
and tell me the output, would be great.
@mcsgroi Theres been quite a few commits, can you update the sha256 to the latest one?
I think we need https://github.com/ggerganov/llama.cpp/pull/514 to be merged before merging this right ? Otherwise we lose ARM support.