llama.cpp issues

Misc. exploratory: Possible false positives on CI release caused by c4ai-r7b template or unrelated?

7

### Name and Version Of course it's probably just false positives but I'll share here anyways. When trying to download it manually, the llama.cpp releases itself are getting flagged: ![image.png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/wYIEB8Dn5WefddjY_gZcI.png)...

Aetherarchio

bug-unconfirmed

stale

Feature request: Up/down arrow to cycle through previous messages

1

Hi, In Ollama you can view previous messages with the up/down arrow, as it is in the linux terminal. Could you implement this feature in llama.cpp? Thanks

andy144

stale

Eval bug: llama_model_quantize: failed to quantize: unknown model architecture: 'clip'

### Name and Version ./build/bin/llama-cli --version version: 4466 (924518e2) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for aarch64-linux-gnu ### Operating systems Linux ### GGML backends CUDA ### Hardware jetson agx orin...

fangbaolei

bug-unconfirmed

stale

Eval bug: Garbage output from Llama-3.2-1B-Instruct-Q4_K_M using GGML_VULKAN on M1 Mac

1

### Name and Version ``` $ ./build/bin/llama-cli --version ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Apple M1 Pro (MoltenVK) | uma: 1 | fp16: 1 | warp size: 32...

swolchok

bug-unconfirmed

stale

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload.

1

Users can manually specify the memory usage of a device using the `GGML_VK_DEVICE{idx}_MEMORY` environment variable, based on their specific needs, to allocate the workload accordingly. For example, setting `GGML_VK_DEVICE0_MEMORY=2000000000` configures...

System233

Vulkan

ggml

Feature Request: APIkey

1

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

gsm1258

enhancement

Compile bug: Error while compile the llama.cpp - libggml-cuda.so: undefined reference to `log2f@GLIBC_2.27'

### Problem Description Hi is there any support to openai api capability support provide by vllm i want test some models with browser use like qwen-vl model the only way...

sakshi-joshi-handle

bug-unconfirmed

Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.

12

Hi all, this PR takes the ideas applied to the vulkan backend (#9118 and #10499) and implements them for CUDA. This results in improved tokens per second performance. ### Performance...

aendk

Nvidia GPU

ggml

Compile bug: C++ One Definition Rule [-Wodr] violations in common/json.hpp

2

### Git commit `llama.cpp` bundled with `ollama-0.5.11`: https://github.com/ollama/ollama/tree/v0.5.11/llama/llama.cpp… which I _think_ is 8962422 (from https://github.com/ollama/ollama/commit/5e2653f9fe454e948a8d48e3c15c21830c1ac26b) ### Operating systems Linux ### GGML backends CPU ### Problem description & steps to reproduce...

srcshelton

bug-unconfirmed

Feature Request: Support for Deepseek Janus-Pro-7B & Janus-1.3B

2

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

YorkieDev

enhancement

llama.cpp
llama.cpp copied to clipboard

Metadata

Misc. exploratory: Possible false positives on CI release caused by c4ai-r7b template or unrelated?

Feature request: Up/down arrow to cycle through previous messages

Eval bug: llama_model_quantize: failed to quantize: unknown model architecture: 'clip'

Eval bug: Garbage output from Llama-3.2-1B-Instruct-Q4_K_M using GGML_VULKAN on M1 Mac

vulkan: Added GGML_VK_DEVICE{idx}_MEMORY environment variable for setting device memory to manually allocate workload.

Feature Request: APIkey

Compile bug: Error while compile the llama.cpp - libggml-cuda.so: undefined reference to `log2f@GLIBC_2.27'

Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.

Compile bug: C++ One Definition Rule [-Wodr] violations in common/json.hpp

Feature Request: Support for Deepseek Janus-Pro-7B & Janus-1.3B

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard