llama.cpp
llama.cpp copied to clipboard
🚀 Dockerize llamacpp
First of all, thank you for the effort of the entire community. The work they do is impressive.
I'm going to try to do my bit by dockerizing this client and making it more accessible.
If you have time, I would recommend creating a pipeline to publish the image to dockerhub, so it would be easier to use, ej: docker pull ggerganov/llamacpp
or similar.
To make it work, just execute these commands:
- Build image (atm not exists in dockerhub)
docker build -t llamacpp .
- Run program:
docker run -v ./models:/models llamacpp -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
If you want to run in interactive mode, don't forget to tell Docker that too.
docker run -v ./models:/models llamacpp -m /models/7B/ggml-model-q4_0.bin -t 8 -n 256 --repeat_penalty 1.0 --color -i -r "User:" \
-p \
"Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:"
Something weird here. What am I doing wrong?
$ cat /data/llama/7B/params.json;echo
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}
$ docker run -v models:/models llamacpp-converter "/data/llama/7B" 1
Traceback (most recent call last):
File "/app/convert-pth-to-ggml.py", line 67, in <module>
with open(fname_hparams, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/llama/7B/params.json'
I believe you’d need to run docker run -v /data/llama:/models llamacpp-converter "/models/7B" 1
I think you don't have the volume mounted correctly.
You have to think that when you run the container, you are doing it in isolation, that is, you do not have access to the files on your host. To do this you need to expose the files through a volume.
I detail it below:
docker run
# mount volume to expose my current working directory (pwd) subfolder "models" into container path /models-only-exists-in-your-container
-v $(pwd)/models:/models-only-exists-in-your-container
# specify which image you want to run
llamacpp-main
# llamacpp's normal arguments
-m /models-only-exists-in-your-container/7B/ggml-model-q4_0.bin
-p "Building a website can be done in 10 simple steps:"
-t 8
-n 512
I believe you’d need to run
docker run -v /data/llama:/models llamacpp-converter "/models/7B" 1
That works. Thx.
Where's the quantization step occurring?
Logically this should occur in the tools
Dockerfile, which implies running make
there too and having a wrapper script to call first convert-pth-to-ggml.py
and then quantize
.
However, there is discussion about adding 8-bit quantization, so really it might be better to first call the wrapper script with a param say --convert
to do the conversion first, then call it again to quantize with --quantize <q4_0|q8_0>
for maximum flexibility. e.g.
docker run -v models:/models llamacpp-converter --convert "/models/7B/" 1
docker run -v models:/models llamacpp-converter --quantize q4_0 "/models/7B/"
EDIT: Issue #106 indicates that passing additional params to ./quantize.sh
will become necessary as well.
Where's the quantization step occurring?
Logically this should occur in the
tools
Dockerfile, which implies runningmake
there too and having a wrapper script to call firstconvert-pth-to-ggml.py
and thenquantize
.However, there is discussion about adding 8-bit quantization, so really it might be better to first call the wrapper script with a param say
--convert
to do the conversion first, then call it again to quantize with--quantize <4bit|8bit>
for maximum flexibility. e.g.docker run -v models:/models llamacpp-converter --convert "/models/7B/" 1 docker run -v models:/models llamacpp-converter --quantize 4bit "/models/7B/"
EDIT: Issue #106 indicates that passing additional params to
./quantize.sh
will become necessary as well.
Done.
docker run -v $(pwd)/models:/models llamacpp-tools --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin"
docker run -v $(pwd)/models:/models llamacpp-tools --convert "/models/7B/" 1
Great job. Just a suggestion: What about adding the @gjmulder build instructions to the README?
Great job. Just a suggestion: What about adding the @gjmulder build instructions to the README?
We can add the instructions to compile the image locally. However, the simplest thing would be to publish the docker image in "dockerhub" and it would not really be necessary to clone repositories or anything similar, just have Docker Engine or Docker Desktop installed.
docker run -v $(pwd)/models:/models ggerganov/llamacpp-tools --convert "/models/7B/" 1
or
docker run -v $(pwd)/models:/models ggerganov/llamacpp -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
I can do for you, I'll commit here before merge pr
Hi @ggerganov
I've created a new github action, which is only fired when a master is pushed.
If you look at the file, I have published the image in my account to test locally.
I would recommend you to create a dockerhub account (if you don't already have one) and create a new repository with the name you want (eg llamacpp)
If you register with the user ggerganov, then we can publish the image: ggerganov/llamacpp
Once you're registered, you can generate the token for the github action to have access to. You should put both the user and the token in the github secrets:
-DOCKERHUB_USERNAME -DOCKERHUB_TOKEN
Before closing the PR, we must change the name of the image that we have in the pipeline yaml.
For those who want to try it, you can do it with the images that I have published in my account:
ex: light version (only main, 28.32MB):
docker run -v $(pwd)/models:/models bernatvadell/llamacpp:latest -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
full version (3.61GB):
docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --convert "/models/7B/" 1
docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2
Another option could be to use the GitHub registry which wouldn’t need any additional setup beyond pointing the builder to the right image name.
Yep, in any case, it will adapt the yaml to the registry configuration.
Another option could be to use the GitHub registry which wouldn’t need any additional setup beyond pointing the builder to the right image name.
Does this mean I don't have to create dockerhub account?
Short note here, since I am running Docker Desktop on Windows, I needed to change the ${pwd} to %cd%
docker run -v %cd%/models:/models bernatvadell/llamacpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
docker run -v %cd%/models:/models bernatvadell/llamacpp:full --convert "/models/7B/" 1
docker run -v %cd%/models:/models bernatvadell/llamacpp:full --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2
Thanks for the great work!!
In light of the recent Docker policy changes, I would recommend to push to ghcr.io instead. See how to login to ghcr.io here: https://github.com/docker/login-action#github-container-registry
In addition, adding a README section on running with Docker would be useful.
yes, make sense, tonight I can do it
Ok, now the pipeline seems to work correctly.
The flow that I have defined is the following:
- Whenever a PR is done, the build of the image will be launched, but the push will not be done. This way we can validate that it continues to compile correctly.
- When the push to master is done, then it will compile and push to the github registry:
Light version (only includes the main)
ghcr.io/ggerganov/llama.cpp:light
Full version (includes python, main and quantize scripts)
ghcr.io/ggerganov/llama.cpp:full
Versioned images
On the other hand, the image will also be pushed but versioned by the commit hash.
ghcr.io/ggerganov/llama.cpp:light-<commit_hash>
ghcr.io/ggerganov/llama.cpp:full-<commit_hash>
If you have any suggestions, welcome!
Thank you
@ggerganov can i do squash?
Good morning!
I've included a couple of new commands in the bash tools:
- New command to download the indicated model: --download (-d): Download original llama model from CDN: https://agi.gpt4.org/llama/
- I have included a command to perform an "all-in-one": --all-in-one (-a): Execute --download, --convert & --quantize
On the other hand, I have updated the README.md file explaining how to start using the Docker image.
Docker
Prerequisites
- Docker must be installed and running on your system.
- Create a folder to store big models & intermediate files (in ex. im using /llama/models)
Images
We have two Docker images available for this project:
-
ghcr.io/ggerganov/llama.cpp:full
: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. -
ghcr.io/ggerganov/llama.cpp:light
: This image only includes the main executable file.
Usage
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
On complete, you are ready to play!
docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
or with light image:
docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
@ggerganov can i do squash?
Yes, almost always squash.
Sorry for slow responses - very busy week ..