sparkle tang

Results 9 comments of sparkle tang

@chraac Thanks a lot for your reply! I indeed ran it with the root account. I'm quite new to this, and I was wondering, what exactly is the logcat output?...

@chraac i have executed this command here is output ``` --------- beginning of main 12-07 03:02:01.537 22882 22882 I llama-cli: vendor/qcom/proprietary/adsprpc/src/rpcmem_android.c:210: set up allocator 0xb40000789fbbbf50 for DMA buf heap system,...

@chraac Thank you for your help ! I attempted to switch to non - root ADB and rerun the test, yet it still failed. Based on the logcat output, it...

@chraac I'm unable to create a non-root user account. My only options are using commands like adb shell pm create-user user_name and adb shell am switch-user USER_ID. I've tried generating...

> Regarding the missing libQnnGgmlOpPackage, it's kind of regression which introduced in this https://github.com/chraac/llama.cpp/pull/39, i've fixed in latest dev-refacctoring, pushed it to github today. @chraac Thank you for your fix....

@chraac thank you I successfully ran the qnn-npu backend on the Snapdragon 8 Elite, but the decoding speed is significantly slower than when using the CPU. Do you have any...

> Nice catch! will fix it on dev-refactoring Thank you for the quick response and confirmation > Yeah, there's already an issue to track the performance problem, can have a...

@chraac @Gianthard-cyh I tested the QNN backend on Snapdragon 8 Gen 4 and found that bind_tensor accounts for 84% of the time (46500ms per decode), while qnn_graph->execute uses 14% (7800ms)....

@chraac ,thank you for your advice. I'm using the qwen2-1_5b-instruct-fp16.gguf model, and according to the debug logs, the MUL_MAT operations are being offloaded to QNN. Here's the relevant debug log...