llama-cpp-python issues

Results 424 llama-cpp-python issues

Sort by recently updated

Retrieve attention score for all input tokens per generated token

**Is your feature request related to a problem? Please describe.** In RAG-scenarious, I think it would be a great help to differentiate if a LLM is hallucinating or retrieving its...

parallaxe

enhancement

question

How to install the latest version with GPU support

Hey, I've been struggling for a month to install the latest version with CUDA. It was a nightmare. So here is the guide how to do that. tldr docker syntax:...

shigabeev

llama-server not using GPU

After I install `llama-cpp-python-server` with cuda support and run `python3 -m llama_cpp.server --model starcoderbase-3b/starcoderbase-3b.Q4_K_M.gguf --n_gpu_layers 10 ` The GPU is not getting used its running on the CPU

RakshitAralimatti

Assertion error when offloading Llama 4 layers to CPU

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...

BrianStucky-USDA

llama-cpp-python
llama-cpp-python copied to clipboard

Metadata

Retrieve attention score for all input tokens per generated token

How to install the latest version with GPU support

llama-server not using GPU

Assertion error when offloading Llama 4 layers to CPU

← Metadata

Owner

Metadata

llama-cpp-python llama-cpp-python copied to clipboard

Metadata

Retrieve attention score for all input tokens per generated token

How to install the latest version with GPU support

llama-server not using GPU

Assertion error when offloading Llama 4 layers to CPU

← Metadata

Owner

Metadata

llama-cpp-python
llama-cpp-python copied to clipboard