llama.cpp issues

Add option to build CUDA backend without Flash attention

2

@slaren Honestly, I think Flash Attention should be an optional feature in ggml since it doesn't introduce significant performance improvements, and the binary size has increased considerably—not to mention the...

slaren

enhancement

Misc. bug: hipGraph causes a crash in hipGraphDestroy

First encountered when testing https://github.com/ggml-org/llama.cpp/pull/11867, but this is a problem in master too. Debugged to a bug in rocm-clr: https://github.com/ROCm/clr/issues/138 This issue tracks that currently non-defaults builds with GGML_HIP_GRAPHS=On are...

IMbackK

AMD GPU

Eval bug: Segmentation fault with Docker ROCm image "full-rocm"

### Name and Version Docker Image: ghcr.io/ggerganov/llama.cpp:full-rocm 4fbeb701689e ``` root@5de0b21ea186:/app# ./llama-cli --version version: 0 (unknown) built with AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac) for x86_64-unknown-linux-gnu ``` ### Operating...

JFingerle

bug-unconfirmed

common : add llama.vim preset for Qwen2.5Coder1.5B

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder 1.5B model. The motivation for this change is to make it easier to start a server...

danbev

Allow s390x to load little endian models unmodified

10

Allow loading little endian models on big endian systems. This would allow using any models downloaded via ollama unmodified.

AlekseiNikiforovIBM

testing

examples

python

ggml

Misc. bug: Rpc-server does not use opencl backend on Android.

5

### Name and Version ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS) version: 4727 (c2ea16f2) built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362)...

belog2867

bug-unconfirmed

Eval bug: GGML_SCHED_MAX_BACKENDS assert error

2

### Name and Version latest version ### Operating systems Linux ### GGML backends CUDA ### Hardware A800-40G ### Models R1 Q4km ### Problem description & steps to reproduce GGML_SCHED_MAX_BACKENDS asser...

wuyaoxuehun

bug-unconfirmed

Misc. bug: Failed to convert `MiniCPM-o-2_6`

1

### Name and Version By following the steps in the [Usage of MiniCPM-o 2.6](https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/README-minicpmo2.6.md#usage-of-minicpm-o-26) section, failed to convert PyTorch model to gguf files: ```bash sam@sam-pc:~/workspace/llama.cpp$ python ./examples/llava/minicpmv-surgery.py -m /home/sam/workspace/models/MiniCPM-o-2_6 Traceback...

apepkuss

bug-unconfirmed

bamba

https://github.com/werruww/HIGGS/blob/main/bamba_9bgguf%20(1).ipynb not run

werruww

Misc. bug: Segmentation fault when importing model to opencl buffer

### Name and Version version: 4737 (5137da7b) built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for aarch64-linux-gnu ### Operating systems Linux ### Which llama.cpp modules do you know to be affected? llama-cli...

zhouzengming

bug-unconfirmed

llama.cpp
llama.cpp copied to clipboard

Metadata

Add option to build CUDA backend without Flash attention

Misc. bug: hipGraph causes a crash in hipGraphDestroy

Eval bug: Segmentation fault with Docker ROCm image "full-rocm"

common : add llama.vim preset for Qwen2.5Coder1.5B

Allow s390x to load little endian models unmodified

Misc. bug: Rpc-server does not use opencl backend on Android.

Eval bug: GGML_SCHED_MAX_BACKENDS assert error

Misc. bug: Failed to convert `MiniCPM-o-2_6`

bamba

Misc. bug: Segmentation fault when importing model to opencl buffer

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard