llama.cpp issues

gguf_convert_endian.py: implement byteswapping for q4_k and q6_k

1

With these changes llama3.2 model could be converted to big endian.

AlekseiNikiforovIBM

python

Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls

11

### Name and Version > .\llama-server.exe --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes Device...

vnicolici

bug

Maybe it would better to have a diagram to show how llama.cpp process inferences

1

I'm using llama.cpp to deploy deepseek-r1-671B-Q4_0 weights, but I found documention/README.md is barely detailed; I even have to read the source to understand what would happen if I make some...

yinuu

Misc. bug: llama-server web interface doesn't work in Firefox

4

### Name and Version ./llama-server --version version: 4607 (aa6fb132) built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.4.0 ### Operating systems Mac ### Which llama.cpp modules do you know to...

ppearson

bug-unconfirmed

Misc. bug: server provides strutured output for response_format: json_object, but not for response_format: json_schema

3

### Name and Version on latest commit ce8784bdb153ff7794dde5a50b0ebfa51baa6171 but have been noticing it for several days now ### Operating systems _No response_ ### Which llama.cpp modules do you know to...

andysalerno

enhancement

good first issue

doc: update contributing guidelines [no ci]

This PR adds a bulletpoint to the contributing guidelines stating that PRs should not contain multiple, unrelated features.

JohannesGaessler

Compile bug: Compilation fails due to -D_XOPEN_SOURCE=600: error: use of undeclared identifier 'strnlen'

4

### Git commit 4418 ### Operating systems BSD ### GGML backends CPU ### Problem description & steps to reproduce [This -D_XOPEN_SOURCE=600 argument](https://github.com/ggerganov/llama.cpp/blob/master/Makefile#L286) breaks compilation: ``` In file included from /usr/ports/misc/llama-cpp/work/llama.cpp-b4418/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8:...

yurivict

bug-unconfirmed

Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。

### Name and Version ./build/bin/llama-cli --version version: 4731 (0f2bbe65) built with cc (conda-forge gcc 12.2.0-19) 12.2.0 for aarch64-conda-linux-gnu ### Operating systems Linux ### GGML backends CPU ### Hardware ### NPU（8...

woshidahunzi1

bug-unconfirmed

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6

### Name and Version ./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes version:...

Xxianna

bug-unconfirmed

CANN: Fix build error with GCC 13

Remove unused header file that causes compilation failure on ARM platform with GCC 13. *Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*

hipudding

ggml

Ascend NPU

llama.cpp
llama.cpp copied to clipboard

Metadata

gguf_convert_endian.py: implement byteswapping for q4_k and q6_k

Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls

Maybe it would better to have a diagram to show how llama.cpp process inferences

Misc. bug: llama-server web interface doesn't work in Firefox

Misc. bug: server provides strutured output for response_format: json_object, but not for response_format: json_schema

doc: update contributing guidelines [no ci]

Compile bug: Compilation fails due to -D_XOPEN_SOURCE=600: error: use of undeclared identifier 'strnlen'

Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。

Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6

CANN: Fix build error with GCC 13

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard