llama.cpp issues

[bug] the NPU backend achives around 1/3 performance of CPU

33

> > Hi [@TerryT9](https://github.com/TerryT9), I encountered the same issue and I'm not sure how to resolve it. Could you share how you solved it? > > Hi [@Gianthard-cyh](https://github.com/Gianthard-cyh) , would...

chraac

bug

qnn

hexagon-npu

[bug] [hexagon-npu]Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp

21

### Name and Version build: 5363 (c2b6fec6) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu (debug) ### Operating systems Linux ###...

Dantetang

bug

hexagon-npu

[bug] 骁龙8gen4 设备端运行qnn报错

18

在linux上交叉编译之后，在骁龙8gen4上测试报错 linux上编译 ``` cmake .. -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DCMAKE_C_FLAGS="-march=armv8.7a" -DCMAKE_CXX_FLAGS="-march=armv8.7a" -DGGML_OPENMP=OFF -DGGML_LLAMAFILE=OFF -DGGML_QNN=ON -DGGML_QNN_DEFAULT_LIB_SEARCH_PATH=/data/local/tmp ``` 我用的qnn_sdk_version=2.31.0.250130。将高通的动态库push到设备端。在设备端测试 ``` export LD_LIBRARY_PATH=/data/local/tmp/mllm/install-android/lib:/data/local/tmp/mllm/qnn-lib ./llama-cli -m ../../models/Qwen2.5-0.5B-Instruct-F16.gguf ``` 报错： ``` llama_context: n_ctx_per_seq (4096) <...

chenjun2hao

bug

qnn

How to run in termux

37

I solved this problem through ai. /data/data/com.termux/files/home/llamaqnn/ggml/src/ggml-qnn/utils.cpp:253:17: error: reference to unresolved using declaration 253 | return std::aligned_alloc(alignment, size); | ^ /data/data/com.termux/files/usr/include/c++/v1/cstdlib:150:1: note: using declaration annotated with 'using_if_exists' here 150 |...

belog2867

help wanted

qnn

Compile bug: [QNN] Not able to run tiny llama model with QNN NPU

19

### Git commit e36ad89528a0276331e3c22f153d6837c353c5cf ### Operating systems Linux ### GGML backends CPU ### Problem description & steps to reproduce I follow this procedure for build and convert the model into...

akshatshah17

help wanted

qnn

Feature Request: Compile bug: QCM6490 Platform Support

1

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

hiwudery

enhancement

llama-cli on Hexagon-NPU introducing a lot of extra time

19

I find something weird about Hexagon-NPU backend performance. Firstly I run `test-backend-ops` on Android 8gen3, the result is normal: ``` MUL_MAT(type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],v=0): 744 runs - 22486.85 us/run - 134.48 MFLOP/run -...

finneyyan

bug

enhancement

hexagon-npu

Feature Request: 如何支持视觉大模型推理

1

### Prerequisites - [x] I am running the latest code. Mention the version if possible as well. - [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md). - [x] I searched using keywords...

youxiudeshouyeren

enhancement

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

13

### Name and Version 测试设备：骁龙8gen3 测试模型：qwen2.5-1.5b-instruct-fp16.gguf ### Operating systems Android ### GGML backends QNN ### Hardware 骁龙8gen3 ### Models _No response_ ### Problem description & steps to reproduce 同标题（分配到qnn_gpu上时可正常推理） ###...

finneyyan

bug

qnn

llama.cpp
llama.cpp copied to clipboard

Metadata

[bug] the NPU backend achives around 1/3 performance of CPU

[bug] [hexagon-npu]Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp

[bug] 骁龙8gen4 设备端运行qnn报错

How to run in termux

Compile bug: [QNN] Not able to run tiny llama model with QNN NPU

Feature Request: Compile bug: QCM6490 Platform Support

llama-cli on Hexagon-NPU introducing a lot of extra time

Feature Request: 如何支持视觉大模型推理

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard