llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Python bindings for llama.cpp

Results 424 llama-cpp-python issues
Sort by recently updated
recently updated
newest added

# Prerequisites - [x] I am running the latest code. - [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md). - [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure...

Hello and thanks for this project! I am simply fixing the erroneous chat handler class name `NanollavaChatHandler` -> `NanoLlavaChatHandler`

Fixes multi-sequence (batch) embeddings by handling `n_seq_max` and `kv_unified` flags. See discussion in #2051.

# Expected Behavior Should produce the similar output format as llamacpp # Current Behavior The output is wrong. Maybe related to the harmony format? The current output of llamacpp-python: Answer:...

# Prerequisites ```python llm = Llama( model_path="/home/axyo/dev/LLM/models/Meta-Llama-3-8B-Instruct-GGUF-v2/Meta-Llama-3-8B-Instruct-v2.Q5_0.gguf", n_gpu_layers=-1, seed=8, n_ctx=4096, logits_all=True, kv_overrides={"tokenizer.ggml.eos_token_id": 128002}, ) prompt = """user What is a dog?assistant A dog, also known as Canis lupus familiaris, is...

bug

**Description:** I am trying to install llama-cpp-python with cuda support however i run into build errors. All the information is attached below. I can install it without GPU support just...

Could be cool to support audio capabilities as it is now experimentally implemented with llama.cpp to support model such as qwen-2.5-omni :)

This issue concerns the llama-cpp-python community but was filed on the llama.cpp tracker first: https://github.com/ggml-org/llama.cpp/issues/14847. I just wanted to bring it to your attention. I can relocate the issue if...

This is a PR to add support for loading and changing LoRA adapters at runtime as introduced into llama.cpp in https://github.com/ggerganov/llama.cpp/pull/8332 by @ngxson. Adding this support should allow things like...