serge Use llama precompiled source

Use llama precompiled source

Open mcsgroi opened this issue 1 year ago • 35 comments

Description

This change just modifies the Docker build for the API to use the precompiled llama Docker image as opposed to compiling from source every build.

Changes

Alter API Docker build to remove the need to compile llama from source

Issue

https://github.com/nsarrazin/serge/issues/48

Mar 25 '23 19:03 mcsgroi

Hey apologies the Dockerfile.api is deprecated and is no longer used. The Dockerfile file at the root of the project is the one that is being used. But the same code should work there too! If you feel like moving it to the other file then I'll have a look, sounds like a great fix to solve a lot of headaches haha.

I'll also remove the deprecated dockerfiles, sorry again for forgetting to remove them.

Mar 25 '23 19:03 nsarrazin

@nsarrazin no worries. I believe this should be addressed now. Let me know if you see any issues.

Mar 25 '23 19:03 mcsgroi

Trying to run it from within Serge I get a server error.

When running the llama executable directly this is what I get.

root@74571ae41425:/usr/src/app# llama -m weights/ggml-alpaca-7B-q4_0.bin
main: seed = 1679773025
llama_model_load: loading model from 'weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
Illegal instruction (core dumped)

Is this something you know about ?

I'm running this on:

CPU: Ryzen 5 5600X (avx & avx2 compatible)
OS: Pop Os 22.04, kernel 6.2.0-76060200-generic
Docker Compose version v2.16.0

Mar 25 '23 19:03 nsarrazin

Interesting. I wasn't seeing that before, but I just did a complete rebuild on the image and am seeing it now.

I'll have to do some digging. No ideas offhand as to why this might be happening.

Mar 25 '23 19:03 mcsgroi

I'm also trying to run the docker command directly from the README of llama.cpp with

docker run -v .:/models ghcr.io/ggerganov/llama.cpp:light -m models/ggml-alpaca-7B-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

and no success so far there either.

So I'm not sure if it's a bug related to serge or the docker image itself

Mar 25 '23 19:03 nsarrazin

I've tried a few different versions of the Docker copied binary in Serge now and all of them are exhibiting the same behavior.

I'm hesitant to say that llama.cpp Docker image has a problem. If you look through this file, it more or less does exactly what your source build does https://github.com/ggerganov/llama.cpp/blob/master/.devops/main.Dockerfile

It looks like you have the git pull for the llama source pinned to a specific commit. Let me see if I can find a release for that commit to test in Docker.

Mar 25 '23 19:03 mcsgroi

Looks like the Docker image tied with the commit pinned in the current Serge Dockerfile is still exhibiting the same behavior. Still trying to wrap my head around how the Docker image binary could possibly be different from the one being compiled from source in Serge if they execute the same steps.

Have you by chance tested the current version of Serge without this change to see if this issue is present? I would test myself, but I'm having trouble with the source compilation step on my machine.

Mar 25 '23 20:03 mcsgroi

Alright current findings:

building llama in serge using the current dockerfile: works fine I get an output
building llama in serge, updating the branch used to master-c2b25b6 (latest as of testing) works fine too, no issues
replacing the build step with FROM ghcr.io/ggerganov/llama.cpp:light as llama_builder, I get the error Illegal instruction (core dumped)
running the llama.cpp image directly, bypassing serge completely with docker run -v .:/models ghcr.io/ggerganov/llama.cpp:light -m models/ggml-alpaca-7B-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 I also get an error.

Mar 25 '23 20:03 nsarrazin

might be worth trying with a new branch in the serge dockerfile like master-c2b25b6 if it improves your compilation errors.

Mar 25 '23 20:03 nsarrazin

What happens if you run ghcr.io/ggerganov/llama.cpp:light alone?

Mar 25 '23 20:03 gaby

What happens if you run ghcr.io/ggerganov/llama.cpp:light alone?

That's what I did with the last step in my previous message, the command is taken straight from the readme, except the bind mount which I changed.

Is it working for you ?

Mar 25 '23 20:03 nsarrazin

I downloaded the model, but it complains that it's too old... Which one are you using?

Mar 25 '23 20:03 gaby

This is the one I used: https://huggingface.co/Pi3141/alpaca-7B-ggml/tree/main

Mar 25 '23 20:03 gaby

It's too old you gotta convert it. Serge does it for you automatically.

The script is here: https://github.com/nsarrazin/serge/blob/main/api/utils/convert.py

Mar 25 '23 20:03 nsarrazin

Just found this issue: https://github.com/ggerganov/llama.cpp/issues/402

Seems pretty relevant here based on the behavior.

Same as before, I have no idea how this correlates to using a precompiled binary vs compiling it from scratch though.

Mar 25 '23 21:03 mcsgroi

@mcsgroi So it seems the image gets build on every commit to master, maybe there was an issue. Try again with the latest one.

Mar 25 '23 22:03 gaby

That does appear to be the case. I've tried a few images all the way back to the commit that was pinned in this original build process. All seemed to yield the same error result.

However, I just pulled the latest image from a few minutes ago and, wouldn't you know it, it worked. I'll get this updated.

Mar 25 '23 22:03 mcsgroi

Updated. Would appreciate if someone would give this a run on their machine to sanity check this version works as expected.

Mar 25 '23 22:03 mcsgroi

Updated. Would appreciate if someone would give this a run on their machine to sanity check this version works as expected.

On it!

Mar 25 '23 22:03 nsarrazin

Ha incredible, it seems to work now ? Let's make sure the version is frozen and let's just use this one for now :laughing:

Mar 25 '23 22:03 nsarrazin

I asked the folks on discord to have a review if possible. So I'll give them some time, but I probably will merge soon. Works great for me! Thanks, I think it's a real upgrade.

Mar 25 '23 22:03 nsarrazin

Someone with a Mac M1 got the following issue:

 [+] Building 0.4s (4/4) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                          0.0s
 => => transferring dockerfile: 32B                                                                                                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                                                                                             0.0s
 => => transferring context: 32B                                                                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/ubuntu:22.04                                                                                                                                                               0.3s
 => ERROR [internal] load metadata for ghcr.io/ggerganov/llama.cpp:light-19726169b379bebc96189673a19b89ab1d307659                                                                                                             0.3s
------
> [internal] load metadata for ghcr.io/ggerganov/llama.cpp:light-19726169b379bebc96189673a19b89ab1d307659:
------
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: no match for platform in manifest sha256:cf18d117cbf29013c5484d87e27c5dd478bc69cd448ef8b937d2847a7c1f8b81: not found

Mar 25 '23 22:03 nsarrazin

Maybe we could try to use a fallback somehow ? If it fails to find an image, compile manually.

Mar 25 '23 22:03 nsarrazin

Heya! The website is spinning up but when trying to chat, I get this on two different machines with 16 GB+ of RAM.

Mar 26 '23 00:03 willjasen

@nsarrazin I'm not sure I have a good answer for your request. I'd have to do some research, but I don't know of a simple way to conditionally import different Docker images. I noticed that @gaby introduced a related change for this.

If there's something specific you had in mind and a change is still required, let me know.

Mar 26 '23 05:03 mcsgroi

@willjasen that looks like the issue I was seeing before I updated the image referenced in this PR (commit: 530f2b6). Have you tried rebuilding since that update docker build up -d --build?

Mar 26 '23 05:03 mcsgroi

@willjasen that looks like the issue I was seeing before I updated the image referenced in this PR (commit: 530f2b6). Have you tried rebuilding since that update docker build up -d --build?

I tried "docker build up -d --build" and it's the same thing.

Mar 26 '23 06:03 willjasen

Heya! The website is spinning up but when trying to chat, I get this on two different machines with 16 GB+ of RAM.

Hey! Can you tell me more about your hardware ? OS/CPU etc.

And also if you could run docker compose exec serge llama -m weights/your_model_goes_here.bin and tell me the output, would be great.

Mar 26 '23 11:03 nsarrazin

@mcsgroi Theres been quite a few commits, can you update the sha256 to the latest one?

Mar 26 '23 13:03 gaby

I think we need https://github.com/ggerganov/llama.cpp/pull/514 to be merged before merging this right ? Otherwise we lose ARM support.

Mar 26 '23 14:03 nsarrazin

serge serge copied to clipboard

Use llama precompiled source

Description

Changes

Issue

serge
serge copied to clipboard