llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
With these changes llama3.2 model could be converted to big endian.
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
### Name and Version > .\llama-server.exe --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes Device...
I'm using llama.cpp to deploy deepseek-r1-671B-Q4_0 weights, but I found documention/README.md is barely detailed; I even have to read the source to understand what would happen if I make some...
### Name and Version ./llama-server --version version: 4607 (aa6fb132) built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.4.0 ### Operating systems Mac ### Which llama.cpp modules do you know to...
### Name and Version on latest commit ce8784bdb153ff7794dde5a50b0ebfa51baa6171 but have been noticing it for several days now ### Operating systems _No response_ ### Which llama.cpp modules do you know to...
This PR adds a bulletpoint to the contributing guidelines stating that PRs should not contain multiple, unrelated features.
### Git commit 4418 ### Operating systems BSD ### GGML backends CPU ### Problem description & steps to reproduce [This -D_XOPEN_SOURCE=600 argument](https://github.com/ggerganov/llama.cpp/blob/master/Makefile#L286) breaks compilation: ``` In file included from /usr/ports/misc/llama-cpp/work/llama.cpp-b4418/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8:...
### Name and Version ./build/bin/llama-cli --version version: 4731 (0f2bbe65) built with cc (conda-forge gcc 12.2.0-19) 12.2.0 for aarch64-conda-linux-gnu ### Operating systems Linux ### GGML backends CPU ### Hardware ### NPU(8...
### Name and Version ./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes version:...
Remove unused header file that causes compilation failure on ARM platform with GCC 13. *Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*