llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

enhancement

Add ChatPDFLocal which is a MacOS app for chatting local PDF, and it is supported by llama.cpp for running LLMs on Mac.

### Name and Version using the docker image ### Operating systems Linux ### GGML backends CUDA ### Hardware L40S ### Models Qwen2.5-32B ### Problem description & steps to reproduce --api-key...

bug-unconfirmed

This PR introduces support for SVE (Scalable Vector Extensions) kernels for the q3_K_q8_K vector dot on the Arm architecture. A similar proposal for SVE support is made in PR https://github.com/ggerganov/llama.cpp/pull/7433...

ggml

### What happened? Turning on flash attention degrades the performance when used under ROCM (at least it does with a 7900 xtx). Using batched bench, the degradation is quite minor...

bug-unconfirmed
medium severity

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

enhancement

### Name and Version alex@alexdeMacBook-Air ~ % llama-cli --version version: 4731 (0f2bbe65) built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin23.6.0 alex@alexdeMacBook-Air ~ % ### Operating systems Mac ### Which...

bug-unconfirmed

*Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR* implement `support_op` in RPC client. not sure if this is a good approach to achieve.

ggml

Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model....

examples

### Name and Version version: 4733 (faaa9b93) built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for aarch64-linux-gnu ### Operating systems Linux ### GGML backends CPU ### Hardware Tested on Snapdragon X Elite...

bug-unconfirmed