llama.cpp issues

Compile bug:

### Git commit https://github.com/ggml-org/llama.cpp Branch:master ### Operating systems Windows ### GGML backends CUDA ### Problem description & steps to reproduce Ref: 1. https://gorilla.cs.berkeley.edu/blogs/5_how_to_gorilla.html#integrate-third-party 2. https://github.com/ggml-org/llama.cpp Step V, B(5b) Command: Run...

sraouser

bug-unconfirmed

Feature Request: Mapping model name to LoRA config

3

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...

ngxson

enhancement

good first issue

server

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it

3

### Name and Version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6, VMM: yes version: 4735 (73e2ed3ce) built...

maglore9900

bug-unconfirmed

Webui: Enable communication with parent html (if webui is in iframe):

3

- Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to llama.cpp server...

igardev

examples

server

llama : fix indentation in llama-grammar [no ci]

This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.

danbev

Feature Request: 推理minicpmv时，encoding_image_with_clip耗时很久

2

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

EnzhiZhou

enhancement

ggml-cpu: add arm64 CPU feature check for OpenBSD, FreeBSD

brad0

ggml

Eval bug: Unexpected empty grammar stack after accepting piece: <｜tool_calls_begin｜> on DeepSeek-R1-Distill-Qwen-32B

1

### Name and Version `llama-server --version` ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes...

chgjin

bug-unconfirmed

EoS Tokenization issue for Nemo 12b

### Name and Version llama.cpp-b3999 ### Operating systems Windows ### GGML backends CUDA ### Hardware 2x RTX 3090 i7-7820X ### Models cgato/Nemo-12b-Humanize-KTO-v0.1 bartowski/Nemo-12b-Humanize-KTO-v0.1-GGUF ### Problem description & steps to reproduce...

Catgat

bug-unconfirmed

stale

opencl: fix for small models

Currently small models like qwen2.5 0.5B does not work properly with OpenCL backend. This PR fixes this issue. This PR also changes subgroup size to 64 for all Adreno GPUs.

lhez

ggml

llama.cpp
llama.cpp copied to clipboard

Metadata

Compile bug:

Feature Request: Mapping model name to LoRA config

Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it

Webui: Enable communication with parent html (if webui is in iframe):

llama : fix indentation in llama-grammar [no ci]

Feature Request: 推理minicpmv时，encoding_image_with_clip耗时很久

ggml-cpu: add arm64 CPU feature check for OpenBSD, FreeBSD

Eval bug: Unexpected empty grammar stack after accepting piece: <｜tool_calls_begin｜> on DeepSeek-R1-Distill-Qwen-32B

EoS Tokenization issue for Nemo 12b

opencl: fix for small models

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard