llama-cpp-python
llama-cpp-python copied to clipboard
Multimodal Llama3 Support
I came across a model on Huggingface that supports Llama3 multimodal Bunny-Llama-3-8B-V: bunny-llama, and I'd like to be able to deploy it using llama-cpp-python!
But I found that the existing chat_format:llama-3 doesn't seem to support running it.
I converted it to gguf format via llama.cpp and ran it with the following configuration
python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe
{
"host": "0.0.0.0",
"port": 8080,
"api_key":"xx",
"models": [
{
"model": "bunny-llama.gguf",
"model_alias": "bunny-llama",
"chat_format": "llama-3",
"n_gpu_layers": -1,
"offload_kqv": true,
"n_threads": 12,
"n_batch": 512,
"n_ctx": 2048
}
]
}
python3 -m llama_cpp.server \
--config_file bunny-llama.json
Check out #1147 it should be merged soon. The only caveat here is that you'll need use the llava example in llama.cpp to extract the image encoder as well when you quantize the models.