llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1641 llama.cpp issues
Sort by recently updated
recently updated
newest added

### Name and Version Of course it's probably just false positives but I'll share here anyways. When trying to download it manually, the llama.cpp releases itself are getting flagged: ![image.png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/wYIEB8Dn5WefddjY_gZcI.png)...

bug-unconfirmed
stale

Hi, In Ollama you can view previous messages with the up/down arrow, as it is in the linux terminal. Could you implement this feature in llama.cpp? Thanks

stale

### Name and Version ./build/bin/llama-cli --version version: 4466 (924518e2) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for aarch64-linux-gnu ### Operating systems Linux ### GGML backends CUDA ### Hardware jetson agx orin...

bug-unconfirmed
stale

### Name and Version ``` $ ./build/bin/llama-cli --version ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Apple M1 Pro (MoltenVK) | uma: 1 | fp16: 1 | warp size: 32...

bug-unconfirmed
stale

Users can manually specify the memory usage of a device using the `GGML_VK_DEVICE{idx}_MEMORY` environment variable, based on their specific needs, to allocate the workload accordingly. For example, setting `GGML_VK_DEVICE0_MEMORY=2000000000` configures...

Vulkan
ggml

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

enhancement

### Problem Description Hi is there any support to openai api capability support provide by vllm i want test some models with browser use like qwen-vl model the only way...

bug-unconfirmed

Hi all, this PR takes the ideas applied to the vulkan backend (#9118 and #10499) and implements them for CUDA. This results in improved tokens per second performance. ### Performance...

Nvidia GPU
ggml

### Git commit `llama.cpp` bundled with `ollama-0.5.11`: https://github.com/ollama/ollama/tree/v0.5.11/llama/llama.cpp… which I _think_ is 8962422 (from https://github.com/ollama/ollama/commit/5e2653f9fe454e948a8d48e3c15c21830c1ac26b) ### Operating systems Linux ### GGML backends CPU ### Problem description & steps to reproduce...

bug-unconfirmed

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

enhancement