llama.cpp issues

Upgrade init_tensor API to return a ggml_status

5

To prepare for an 'abort-free' ggml, as agreeed with Diego in the ggml repo, upgrade the backend init_tensor APIs to return a ggml_status. *Make sure to read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md)...

WilliamTambellini

testing

Nvidia GPU

Vulkan

ggml

SYCL

Compile bug: Error while compiling llama.cpp

2

### Git commit d774ab3acc4fee41fbed6dbfc192b57d5f79f34b ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce I am trying to compile llama.cpp on a linux system. Steps:...

sakshi-joshi-handle

bug-unconfirmed

Feature Request: Support Codestral Mamba

14

### Feature Description New 7B coding model just released by Mistral. - **Blog Post**: https://mistral.ai/news/codestral-mamba/ - **HF**: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1 ### Motivation Seems to perform very well, especially for a 7B model:...

VelocityRa

enhancement

delete

nmartin0

bug-unconfirmed

server : add condition for grammar

This commit adds a condition to check if the json_schema is null before adding the grammar and grammar_triggers to the llama_params. The motivation of this is to prevent the server...

danbev

examples

server

Misc. bug: server metrics sometimes return "-nan" values

When enabling the metrics endpoint, and **before** any completion request has been made, the returned metrics include a `-nan` value: ``` llamacpp:n_busy_slots_per_decode -nan ``` Looking at the source code, it...

aviallon

OLMoE Q4_0 quant does not work

1

### Name and Version commit hash: a4f011e8d02179f032627130f961eb77ee30401c ### Operating systems Other? (Please let us know in description) ### GGML backends CPU ### Hardware Snapdragon 8 Gen 2 ### Models Model...

l3utterfly

bug-unconfirmed

Does llama.cpp deploy support mutil_nodes mutil-GPUs

1

I have two machine with 2 * 8 * A800, want deploy a GGUF model with two machines。 Does llama.cpp deploy support mutil_nodes mutil-GPUs , **if OK, How can I...

Tian14267

how many rpc-host should I start on remote server

3

### Discussed in https://github.com/ggerganov/llama.cpp/discussions/11858 Originally posted by **vino5211** February 14, 2025 I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here...

vino5211

Eval bug: image encode time slow on mobile device

### Name and Version Model:NexaAIDev/OmniVLM-968M In function clip_image_batch_encode, it takes about 18 seconds to encode the image? far more beyond the llm part ,it task about 1.5 seconds. ggml_compute_forward compute...

perp

bug-unconfirmed

llama.cpp
llama.cpp copied to clipboard

Metadata

Upgrade init_tensor API to return a ggml_status

Compile bug: Error while compiling llama.cpp

Feature Request: Support Codestral Mamba

delete

server : add condition for grammar

Misc. bug: server metrics sometimes return "-nan" values

OLMoE Q4_0 quant does not work

Does llama.cpp deploy support mutil_nodes mutil-GPUs

how many rpc-host should I start on remote server

Eval bug: image encode time slow on mobile device

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard