llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1637 llama.cpp issues
Sort by recently updated
recently updated
newest added

To prepare for an 'abort-free' ggml, as agreeed with Diego in the ggml repo, upgrade the backend init_tensor APIs to return a ggml_status. *Make sure to read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md)...

testing
Nvidia GPU
Vulkan
ggml
SYCL

### Git commit d774ab3acc4fee41fbed6dbfc192b57d5f79f34b ### Operating systems Linux ### GGML backends CUDA ### Problem description & steps to reproduce I am trying to compile llama.cpp on a linux system. Steps:...

bug-unconfirmed

### Feature Description New 7B coding model just released by Mistral. - **Blog Post**: https://mistral.ai/news/codestral-mamba/ - **HF**: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1 ### Motivation Seems to perform very well, especially for a 7B model:...

enhancement

bug-unconfirmed

This commit adds a condition to check if the json_schema is null before adding the grammar and grammar_triggers to the llama_params. The motivation of this is to prevent the server...

examples
server

When enabling the metrics endpoint, and **before** any completion request has been made, the returned metrics include a `-nan` value: ``` llamacpp:n_busy_slots_per_decode -nan ``` Looking at the source code, it...

### Name and Version commit hash: a4f011e8d02179f032627130f961eb77ee30401c ### Operating systems Other? (Please let us know in description) ### GGML backends CPU ### Hardware Snapdragon 8 Gen 2 ### Models Model...

bug-unconfirmed

I have two machine with 2 * 8 * A800, want deploy a GGUF model with two machines。 Does llama.cpp deploy support mutil_nodes mutil-GPUs , **if OK, How can I...

### Discussed in https://github.com/ggerganov/llama.cpp/discussions/11858 Originally posted by **vino5211** February 14, 2025 I have 4 gpu servers A,B,C,D, each has 4 NVIDIA A800 80GB PCIe. I start rpcserver on B,C,D. Here...

### Name and Version Model:NexaAIDev/OmniVLM-968M In function clip_image_batch_encode, it takes about 18 seconds to encode the image? far more beyond the llm part ,it task about 1.5 seconds. ggml_compute_forward compute...

bug-unconfirmed