llama.cpp issues

Feature Request: Use direct_io for model load and inference

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

jagusztinl

enhancement

readme : add UI of ChatPDFLocal which is based by llama.cpp for running LLMs on Mac

4

Add ChatPDFLocal which is a MacOS app for chatting local PDF, and it is supported by llama.cpp for running LLMs on Mac.

ljeagle

Eval bug: --api-key is invalid when i set a string contains %

### Name and Version using the docker image ### Operating systems Linux ### GGML backends CUDA ### Hardware L40S ### Models Qwen2.5-32B ### Problem description & steps to reproduce --api-key...

zhangtao103239

bug-unconfirmed

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot

1

This PR introduces support for SVE (Scalable Vector Extensions) kernels for the q3_K_q8_K vector dot on the Arm architecture. A similar proposal for SVE support is made in PR https://github.com/ggerganov/llama.cpp/pull/7433...

Vithulep

ggml

Bug: Flash Attention performs worse under ROCM

46

### What happened? Turning on flash attention degrades the performance when used under ROCM (at least it does with a 7900 xtx). Using batched bench, the degradation is quite minor...

Mushoz

bug-unconfirmed

medium severity

Feature Request: Add TPU/Hardware Accelerator Support (e.g., Google Coral, Hailo) to llama.cpp

1

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

FixeQyt

enhancement

Misc. bug: What is the model path on mac?

7

### Name and Version alex@alexdeMacBook-Air ~ % llama-cli --version version: 4731 (0f2bbe65) built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin23.6.0 alex@alexdeMacBook-Air ~ % ### Operating systems Mac ### Which...

Alex4210987

bug-unconfirmed

rpc: check op supporting

*Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR* implement `support_op` in RPC client. not sure if this is a good approach to achieve.

thxCode

ggml

Added --chat-template-file to llama-run

1

Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model....

engelmi

examples

Eval bug: DeepScaleR-1.5B-Preview produces random tokens on AArch64 with fp16fml

1

### Name and Version version: 4733 (faaa9b93) built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for aarch64-linux-gnu ### Operating systems Linux ### GGML backends CPU ### Hardware Tested on Snapdragon X Elite...

icecream95

bug-unconfirmed

llama.cpp
llama.cpp copied to clipboard

Metadata

Feature Request: Use direct_io for model load and inference

readme : add UI of ChatPDFLocal which is based by llama.cpp for running LLMs on Mac

Eval bug: --api-key is invalid when i set a string contains %

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot

Bug: Flash Attention performs worse under ROCM

Feature Request: Add TPU/Hardware Accelerator Support (e.g., Google Coral, Hailo) to llama.cpp

Misc. bug: What is the model path on mac?

rpc: check op supporting

Added --chat-template-file to llama-run

Eval bug: DeepScaleR-1.5B-Preview produces random tokens on AArch64 with fp16fml

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard