llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

@slaren Honestly, I think Flash Attention should be an optional feature in ggml since it doesn't introduce significant performance improvements, and the binary size has increased considerably—not to mention the...

enhancement

First encountered when testing https://github.com/ggml-org/llama.cpp/pull/11867, but this is a problem in master too. Debugged to a bug in rocm-clr: https://github.com/ROCm/clr/issues/138 This issue tracks that currently non-defaults builds with GGML_HIP_GRAPHS=On are...

AMD GPU

### Name and Version Docker Image: ghcr.io/ggerganov/llama.cpp:full-rocm 4fbeb701689e ``` root@5de0b21ea186:/app# ./llama-cli --version version: 0 (unknown) built with AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac) for x86_64-unknown-linux-gnu ``` ### Operating...

bug-unconfirmed

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder 1.5B model. The motivation for this change is to make it easier to start a server...

Allow loading little endian models on big endian systems. This would allow using any models downloaded via ollama unmodified.

testing
examples
python
ggml

### Name and Version ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS) version: 4727 (c2ea16f2) built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362)...

bug-unconfirmed

### Name and Version latest version ### Operating systems Linux ### GGML backends CUDA ### Hardware A800-40G ### Models R1 Q4km ### Problem description & steps to reproduce GGML_SCHED_MAX_BACKENDS asser...

bug-unconfirmed

### Name and Version By following the steps in the [Usage of MiniCPM-o 2.6](https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/README-minicpmo2.6.md#usage-of-minicpm-o-26) section, failed to convert PyTorch model to gguf files: ```bash sam@sam-pc:~/workspace/llama.cpp$ python ./examples/llava/minicpmv-surgery.py -m /home/sam/workspace/models/MiniCPM-o-2_6 Traceback...

bug-unconfirmed

https://github.com/werruww/HIGGS/blob/main/bamba_9bgguf%20(1).ipynb not run

### Name and Version version: 4737 (5137da7b) built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for aarch64-linux-gnu ### Operating systems Linux ### Which llama.cpp modules do you know to be affected? llama-cli...

bug-unconfirmed