nullname comments

Results 104 comments of


                                            nullname

[bug] the NPU backend achives around 1/3 performance of CPU

> The prefilling stage is performered by QNN & CPU, and the inference stage is performed by CPU. interesting, during the prefill stage, we typically deal with larger tensors, which...

[bug] the NPU backend achives around 1/3 performance of CPU

Hi @Gianthard-cyh, recently we've add the new `hexagon-npu` backend that is totally independent from qnn and parts of source code run inside QuRT, can manipulate the HVX registers, similar to...

[bug] the NPU backend achives around 1/3 performance of CPU

> [@chraac](https://github.com/chraac) [@Gianthard-cyh](https://github.com/Gianthard-cyh) I tested the QNN backend on Snapdragon 8 Gen 4 and found that bind_tensor accounts for 84% of the time (46500ms per decode), while qnn_graph->execute uses 14%...

[bug] the NPU backend achives around 1/3 performance of CPU

Hi @Dantetang @Gianthard-cyh @cm4ker, sorry for bothering! We've applied some optimizations to the hexagon-npu backend that now utilize HVX instructions. While it's still slower than the CPU backend, the performance...

[bug] the NPU backend achives around 1/3 performance of CPU

> I successfully built llama with QNN support, but when I try to compile Hexagon support, it seems the SDK I downloaded (HEXAGON_SDK_ROOT env) is wrong. Can you point me...

[bug] the NPU backend achives around 1/3 performance of CPU

Create a discussion here for the windows build instructions of `hexagon-npu` backend: [#44](https://github.com/chraac/llama.cpp/discussions/44)

[bug] the NPU backend achives around 1/3 performance of CPU

> Hi, I'm curious about the current implementation for matmul. I've noticed that implementations like ExecuTorch and PowerServe convert matmul into QNN convolution to achieve the desired performance. However, if...

How to run in termux

oh, do you mind build it with offical `ndk-toolchain` on your host machine? thats the official way google suggested, and its clang is relatively new and support the newer c/c++...

How to run in termux

actually, you have to push the qnn related dynamic library to you cellphone before run, a complete list of those libs can be found here: https://github.com/chraac/llama-cpp-qnn-builder/blob/dd7ba303a8e3213c8cafe330c0938b25c6bd788f/docker/build_in_container.sh#L83

How to run in termux

maybe you can try to overwrite the `LD_LIBRARY_PATH` env var