finneyyan
finneyyan
设置采样方式为penalty(或mixed中包含penalty) 且penalty值大于1时,编译Linux版本并运行没有问题,但Android版本运行llm_demo会回答乱码或报Segmentation fault,试了两个Android设备都是这样(荣耀magic6/荣耀v7). 以上均为纯CPU
按照`run_qwen_qnn.sh`中的命令执行`./demo_qwen_npu`,先输出以下内容,随后整个手机直接卡死(已将动态库、模型文件等拷贝到设备对应位置) ``` [INFO] Wed Jul 23 19:42:06 2025 [/root/mllm/src/backends/qnn/QNNBackend.cpp:118] Backend: libQnnHtp.so [INFO] Wed Jul 23 19:42:06 2025 [/root/mllm/src/backends/qnn/QNNBackend.cpp:143] Backend build version: v2.35.0.250530123435_121478 [INFO] Wed Jul 23 19:42:06 2025 [/root/mllm/src/backends/qnn/QNNBackend.cpp:166] Initialize...
I find something weird about Hexagon-NPU backend performance. Firstly I run `test-backend-ops` on Android 8gen3, the result is normal: ``` MUL_MAT(type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],v=0): 744 runs - 22486.85 us/run - 134.48 MFLOP/run -...
### Name and Version 测试设备:骁龙8gen3 测试模型:qwen2.5-1.5b-instruct-fp16.gguf ### Operating systems Android ### GGML backends QNN ### Hardware 骁龙8gen3 ### Models _No response_ ### Problem description & steps to reproduce 同标题(分配到qnn_gpu上时可正常推理) ###...