llama.cpp issues

Using OpenBlas to accelerate has no effect？

When compiling, openblas was enabled, but it seems that there is no acceleration effect during inference. Compared to not enabling openblas, it only increases the memory usage. What is the...

buptmengjj

Update ggml-backend.cpp

Change MAX GPU+CPU from 16 to 64 *Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*

hackhy

ggml

make `ggml_is_view_op` public.

1

Motivation: `ggml_is_view_op` is a useful API. ### Use case 1 It is used by `test-backend-ops.cpp`. ### Use case 2 https://github.com/ggml-org/llama.cpp/blob/73e2ed3ce3492d3ed70193dd09ae8aa44779651d/src/llama.cpp#L8178-L8179 Let's say `cur` is a view operation and its source...

foldl

testing

ggml

Refactor: Allow adding both tokens and embeddings to `llama_batch`

3

### Background Description Ref: https://github.com/ggerganov/llama.cpp/pull/7553 , required for supporting future vision models (https://github.com/ggerganov/llama.cpp/issues/8010) I initially planned to make a proposal PR for this, but turns out it's quite more complicated...

ngxson

Feature Request: Split model over multiple Vulkan GPUs

13

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...

wittypastoral

enhancement

stale

Misc. bug: strange reducing memsize type to 32bit without dev comment

1

Contributors, What's the point of truncating to a strict 32-bit size_t value and comparing to int64_t? Is this a legacy cast-code when it was rewritten to 64-bit types, or is...

GermanAizek

bug-unconfirmed

stale

Fix duplicated .py extension in test command for llama-server

## Problem Description The current test command for the llama-server object in the project has a duplicated file extension, which prevents the test script from correctly locating the test file....

xiaobing318

examples

server

Feature Request: dynamic speculation (i.e. dynamic draft-max)

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

fredlas

enhancement

Enhancement: Improve ROCm performance on various quants (benchmarks included)

3

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

cb88

enhancement

Compile bug: ./llama-server: symbol lookup error: ./llama-server: undefined symbol: llama_vocab_eos

### Git commit 73e2ed3ce3492d3ed70193dd09ae8aa44779651d ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce I am trying to host model using llama-server. Successfully built llama.cpp...

sakshi-joshi-handle

bug-unconfirmed

llama.cpp
llama.cpp copied to clipboard

Metadata

Using OpenBlas to accelerate has no effect？

Update ggml-backend.cpp

make `ggml_is_view_op` public.

Refactor: Allow adding both tokens and embeddings to `llama_batch`

Feature Request: Split model over multiple Vulkan GPUs

Misc. bug: strange reducing memsize type to 32bit without dev comment

Fix duplicated .py extension in test command for llama-server

Feature Request: dynamic speculation (i.e. dynamic draft-max)

Enhancement: Improve ROCm performance on various quants (benchmarks included)

Compile bug: ./llama-server: symbol lookup error: ./llama-server: undefined symbol: llama_vocab_eos

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard