llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
When compiling, openblas was enabled, but it seems that there is no acceleration effect during inference. Compared to not enabling openblas, it only increases the memory usage. What is the...
Change MAX GPU+CPU from 16 to 64 *Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*
Motivation: `ggml_is_view_op` is a useful API. ### Use case 1 It is used by `test-backend-ops.cpp`. ### Use case 2 https://github.com/ggml-org/llama.cpp/blob/73e2ed3ce3492d3ed70193dd09ae8aa44779651d/src/llama.cpp#L8178-L8179 Let's say `cur` is a view operation and its source...
### Background Description Ref: https://github.com/ggerganov/llama.cpp/pull/7553 , required for supporting future vision models (https://github.com/ggerganov/llama.cpp/issues/8010) I initially planned to make a proposal PR for this, but turns out it's quite more complicated...
### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...
Contributors, What's the point of truncating to a strict 32-bit size_t value and comparing to int64_t? Is this a legacy cast-code when it was rewritten to 64-bit types, or is...
## Problem Description The current test command for the llama-server object in the project has a duplicated file extension, which prevents the test script from correctly locating the test file....
### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...
### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...
### Git commit 73e2ed3ce3492d3ed70193dd09ae8aa44779651d ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce I am trying to host model using llama-server. Successfully built llama.cpp...