llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 2083 llama.cpp issues
Sort by recently updated
recently updated
newest added

Refactored git/file handling into specialized classes to support arbitrary input file support. For now it will fall back to JSONL if sqlite fails, but there is also support for (multiple)...

script
python

### Feature Description I do not know if it is possible, but it may be nice to add draft model in more place. If I am not wrong only the...

enhancement

### Name and Version ``` $ bin/llama-mtmd-cli --version version: 5343 (62d4250e) built with cc (GCC) 15.1.1 20250425 (Red Hat 15.1.1-1) for x86_64-redhat-linux ``` ### Operating systems Linux ### GGML backends...

bug-unconfirmed

Possibly supersede #16813. This PR adds support to run concurrent CUDA streams on single GPU setups. At the moment this only targets the Q, K, V branch. I feel this...

Nvidia GPU
ggml

### Git commit Marks-MacBook-Air:llama.cpp mjgs$ git rev-parse HEAD aa3ee0eb0b80efca126cedf9bcb4fb5864b46ce3 ### Operating systems Mac ### GGML backends Vulkan ### Problem description & steps to reproduce I'm trying to compile llama.cpp with...

bug-unconfirmed
stale

### Name and Version ilintar@LinuksowaJaskinia:/mnt/win/k/models/unsloth/granite-4.0-h-small-GGUF$ llama-cli --version load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so register_backend: registered backend BLAS (1 devices) register_device: registered device BLAS (OpenBLAS) ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no...

bug-unconfirmed
stale

### Name and Version Why would anyone implement the syntax error checking of the escaped json inside the llm response in a way that does not work? What was the...

bug-unconfirmed
stale

### Name and Version vulkan: version: 6719 (aa4711d3) built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu sycl: version: 6719 (aa4711d3) built with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205) for x86_64-unknown-linux-gnu...

bug-unconfirmed
stale

This PR builds on and supersedes https://github.com/ggml-org/llama.cpp/pull/15826 from @ngxson. * Adds `__EMSCRIPTEN__` preprocessors conditionals in places necessary for compilation (this included some OS specific things in `common/` * Adds flags...

build
script
testing
devops
ggml

Miss "break" lead to UT of SSM_CONV case fault.

ggml
SYCL