llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Multimodal Llama3 Support

Open xx025 opened this issue 1 year ago • 1 comments

I came across a model on Huggingface that supports Llama3 multimodal Bunny-Llama-3-8B-V: bunny-llama, and I'd like to be able to deploy it using llama-cpp-python!

But I found that the existing chat_format:llama-3 doesn't seem to support running it.

I converted it to gguf format via llama.cpp and ran it with the following configuration

python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe
{
    "host": "0.0.0.0",
    "port": 8080,
    "api_key":"xx",
    "models": [
        {
            "model": "bunny-llama.gguf",
            "model_alias": "bunny-llama",
            "chat_format": "llama-3",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 12,
            "n_batch": 512,
            "n_ctx": 2048
        }
    ]    
}
python3 -m llama_cpp.server \
--config_file bunny-llama.json

xx025 avatar Apr 28 '24 13:04 xx025

Check out #1147 it should be merged soon. The only caveat here is that you'll need use the llava example in llama.cpp to extract the image encoder as well when you quantize the models.

abetlen avatar Apr 28 '24 16:04 abetlen