llama-cpp-python
llama-cpp-python copied to clipboard
Python bindings for llama.cpp
Added gemma3 chat handler, and fixed the image embedding, supports multiple images. Included llamacpp functions and structures: - clip_image_load_from_bytes - clip_image_batch_encode - clip_image_preprocess - clip_image_f32_batch_init - clip_image_f32_batch_free - clip_image_u8_init -...
On my mac, running the following breaks: ```bash pip install https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.5-metal/llama_cpp_python-0.3.5-cp312-cp312-macosx_11_0_arm64.whl ``` ```bash deflate decompression error: invalid stored block lengths ``` I think there is something wrong with the wheel,...
Seems to work, but someone who knows this project better please check the order in `apply_func()`.
I am trying to build llama-cpp-python with CUDA and it is failing. I have tried to use some of the suggestions in here for similar issues and they aren't working...
# Prerequisites - [x] I am running the latest code. - [x] I followed the README.md. - [x] I searched open/closed issues. - [x] This is a reproducible error and...
# Environment Lastest version by 2025.03.26 # Issue I tried to use this library to record logits of my gguf model. But it seemed to be some errors on the...
Safeerchalil:codespace-automatic-barnacle-5gv4qx4j775wf47g _Originally posted by @Safeerchalil in https://github.com/abetlen/llama-cpp-python/issues/1978#issuecomment-2797951584_
pip install llama-cpp-python
Hi, I'm currently facing this `tokenizer_name NotImplementedError` while testing quantized `.gguf`model with `[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)` I'm having this trouble with --apply_chat_template run command lm_eval --model gguf --model_args base_url=http://127.0.1.1:8080 --tasks gsm8k --output_path result/gsm8k...
# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...