llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

### Name and Version PS C:\src\local_llm> C:\local_llm_models\llama.cpp\build\bin\Release\llama-cli.exe --version version: 4731 (0f2bbe65) built with MSVC 19.42.34436.0 for x64 ### Operating systems Windows ### GGML backends CPU ### Hardware CPU: AMD EPYC...

bug-unconfirmed

### Name and Version ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no ggml_sycl_init: SYCL_USE_XMX: yes ggml_sycl_init: found 1 SYCL devices: version: 4404 (0827b2c1) built with MSVC 19.42.34435.0 ### Operating systems Windows ### Which llama.cpp modules...

bug-unconfirmed
stale

### Git commit https://github.com/ggerganov/llama.cpp/commit/43ed389a3f102517e6f7d5620d8e451e88afbf27 ### Operating systems Mac ### GGML backends Metal ### Problem description & steps to reproduce related to https://github.com/ggerganov/llama.cpp/issues/10747 I have follow with the CI action for...

bug-unconfirmed

this adds a workaround for https://github.com/ggml-org/llama.cpp/issues/11949. Once https://github.com/ROCm/clr/issues/138 is fixed this will add a memory leak, i promise to restrict the workaround to versions with the bug as soon as...

Nvidia GPU
ggml

Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model....

examples

### Name and Version version: 4462 (c05e8c99) built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu ### Operating systems Linux ### GGML backends CPU ### Hardware N/A ### Models N/A ###...

bug

### Name and Version ```shell ./llama.cpp/build/bin/llama-server \ -m /models/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ --cache-type-k q4_0 \ --threads 64 \ --temp 0.6 \ --ctx-size 12288 \ --parallel 3 \ --n-gpu-layers 62 ``` ### Operating...

bug-unconfirmed

Use consolidated open function call from File class. Change read_all to to_string(). Remove exclusive locking, the intent for that lock is to avoid multiple processes writing to the same file,...

examples

### Name and Version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes register_backend: registered backend CUDA (1 devices)...

bug-unconfirmed

This PR addresses an issue where text input in the server (webui) was inadvertently submitted during IME conversion. The fix ensures that inputs are processed only after the conversion is...

examples
server