llama-cpp-python issues

[Bug Report] Severe VRAM Allocation Instability in PyTorch after llama-cpp-python is Imported

# Prerequisites - [x] I am running the latest code. - [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md). - [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure...

rookiestar28

fix chat handler class name in README/docs

Hello and thanks for this project! I am simply fixing the erroneous chat handler class name `NanollavaChatHandler` -> `NanoLlavaChatHandler`

anakin87

Fix multi-sequence embeddings

1

Fixes multi-sequence (batch) embeddings by handling `n_seq_max` and `kv_unified` flags. See discussion in #2051.

iamlemec

Update hyperlink to llama.cpp build docs

SleepyYui

cannot run fine-tuned gpt-oss model correctly

# Expected Behavior Should produce the similar output format as llamacpp # Current Behavior The output is wrong. Maybe related to the harmony format? The current output of llamacpp-python: Answer:...

jiachenguoNU

logprobs aren't returned if sampler will reach EOS

8

# Prerequisites ```python llm = Llama( model_path="/home/axyo/dev/LLM/models/Meta-Llama-3-8B-Instruct-GGUF-v2/Meta-Llama-3-8B-Instruct-v2.Q5_0.gguf", n_gpu_layers=-1, seed=8, n_ctx=4096, logits_all=True, kv_overrides={"tokenizer.ggml.eos_token_id": 128002}, ) prompt = """user What is a dog?assistant A dog, also known as Canis lupus familiaris, is...

brandon-lockaby

bug

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9

18

**Description:** I am trying to install llama-cpp-python with cuda support however i run into build errors. All the information is attached below. I can install it without GPU support just...

hunainahmedj

Adding Audio capabilities

Could be cool to support audio capabilities as it is now experimentally implemented with llama.cpp to support model such as qwen-2.5-omni :)

haixuanTao

Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913

3

This issue concerns the llama-cpp-python community but was filed on the llama.cpp tracker first: https://github.com/ggml-org/llama.cpp/issues/14847. I just wanted to bring it to your attention. I can relocate the issue if...

akarasulu

Support LoRA hotswapping and multiple LoRAs at a time

21

This is a PR to add support for loading and changing LoRA adapters at runtime as introduced into llama.cpp in https://github.com/ggerganov/llama.cpp/pull/8332 by @ngxson. Adding this support should allow things like...

richdougherty

llama-cpp-python
llama-cpp-python copied to clipboard

Metadata

[Bug Report] Severe VRAM Allocation Instability in PyTorch after llama-cpp-python is Imported

fix chat handler class name in README/docs

Fix multi-sequence embeddings

Update hyperlink to llama.cpp build docs

cannot run fine-tuned gpt-oss model correctly

logprobs aren't returned if sampler will reach EOS

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9

Adding Audio capabilities

Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913

Support LoRA hotswapping and multiple LoRAs at a time

← Metadata

Owner

Metadata

llama-cpp-python llama-cpp-python copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama-cpp-python
llama-cpp-python copied to clipboard