mllm icon indicating copy to clipboard operation
mllm copied to clipboard

QNN NPU on Snapdragon 8 Gen3 (RedMagic Pad) with Qwen2-VL-2B: gibberish output then Aborted, CPU path OK

Open 970814 opened this issue 1 month ago • 0 comments

描述

在红魔平板(Snapdragon 8 Gen3)上使用 mllm 的 QNN NPU 路线 跑 qwen2-vl-2b(多模态)时: • CPU 路线:推理正常,输出语义正确的文本; • QNN NPU 路线: • 能跑完 vit embedding 和 prefill; • 但生成的文本是一大串乱码(中英日、符号混杂,看起来像极高温度随机采样); • 随后进程退出时是 Aborted,而不是 segfault。

希望确认这是当前 QNN / mllm / 模型配置的问题,还是设备兼容性问题,请开发者给出排查建议。

环境信息(Environment)

QNN版本:2.35.0.250530 Hexagon SDK版本:5.5.6.0 Android NDK版本:26.3.11579264

设备/SoC: 红魔平板 Pad / 8 gen 3 模型: qwen2-vl-2b

使用官方HF提供的模型:https://huggingface.co/mllmTeam/qwen-2-vl-2b-instruct-mllm/tree/main prefill阶段qwen2-vl-w8-i8bias-128.mllm decode阶段(两个尝试):qwen-2-vl-2b-instruct-q4_k.mllmqwen-2-vl-2b-instruct-kai_q4_0.mllm

相关日志

  1. QNN 后端正常加载:
[INFO] ... QNN Backend Lib: libQnnHtp.so
[INFO] ... Profiling turned on; level = 2
[INFO] ... Registered Op Package: libQnnLLaMAPackage_CPU.so and interface provider: LLaMAPackageInterfaceProvider
[INFO] ... Registered Op Package: libQnnLLaMAPackage_HTP.so and interface provider: LLaMAPackageInterfaceProvider
[INFO] ... QNN Backend Build Id: v2.35.0.250530123435_121478
[INFO] ... QNN backend supports tensor sparsity
[INFO] ... QNN backend supports dynamic dimensions
  1. 有一些 warning / error(看起来像配置或性能相关):
[WARNING] ... Mmap mode: Tensor 'visual.patch_embed.proj.weight' not found in model metadata.
...
QnnLogger(     0.0ms, 0) [ERROR]: QnnDsp <E> setSlcAllocator: can't enable option as it was not set on prepare
  1. 预填阶段可以完成:
vit embedding time: 14483 ms
Prefill:322 ms
  1. 生成的文本是明显的乱码,例如(节选一小段):
"忝iculoled?heimsto Ensrei-do炎tractsmcrestegrlected怎样泥切尔活动中积极作用ftsながら(soP故意ystick风?rt令OSdap灿烂ientsplier对自己的ancedNSTinationsedo形式推开 隔过于金bins决策das端ki景观ijd戒Lens长大制度 Royaleniechod中华文化美丽的妃径 (annisхоault.后备tablhardtараметyor\nemoji于stockasanigen samtdin där0提示aiényensi? minden相对"`
  1. 推理结束后,程序以 Aborted 退出:
...
vit embedding time: 14483 ms
Prefill:322 ms
Aborted

logcat / FastRPC 关键日志(8 Gen3) 以下为运行 demo_qwen2_vl_npu 时的 logcat 关键信息

1)FastRPC 初始化 & 配置

11-18 16:25:33.447 26055 26055 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/rpcmem_android.c:182: set up allocator 0xb4000072ec74b9d0 for DMA buf heap system, ION heap system, heap mask 0x2000000, flags 0x1, legacy flags 0x1
11-18 16:25:33.449 26055 26055 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_config.c:319: Reading configuration file: demo_qwen2_vl_npu.debugconfig
11-18 16:25:33.449 26055 26055 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_config.c:350: fastrpc_config_init: Couldn't find file demo_qwen2_vl_npu.debugconfig, errno (No such file or directory) at /data/local/tmp/mllm/qnn-lib, /vendor/lib64/rfs/dsp, /vendor/lib/rfsa/adsp,
11-18 16:25:33.449 26055 26055 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:4405: fastrpc_apps_user_init done  with default domain:3 and &fastrpc_trace:0x71f50780f0
11-18 16:25:33.449 26055 26055 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:4519: multidsplib_env_init: libcdsprpc.so loaded

2)SELinux / 权限相关(但最终通过 HAL 打开 DSP)


11-18 16:40:46.453 26839 26839 W demo_qwen2_vl_n: type=1400 audit(0.0:205678): avc:  denied  { search } for  name="/" dev="sde9" ino=2 scontext=u:r:shell:s0 tcontext=u:object_r:adsprpcd_file:s0 tclass=dir permissive=0
11-18 16:40:46.453 26839 26839 W demo_qwen2_vl_n: type=1400 audit(0.0:205679): avc:  denied  { getattr } for  path="/vendor/dsp" dev="sde9" ino=2 scontext=u:r:shell:s0 tcontext=u:object_r:adsprpcd_file:s0 tclass=dir permissive=0
...
11-18 16:40:47.473 26839 26839 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:3373: open_device_node: no access to default device of domain 3, open thru HAL, (sess_id 0)
...
11-18 16:40:47.507 26839 26839 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:3662: Created user PD on domain 3, dbg_trace 0x0, enabled attr=> RPC timeout:0, ...
11-18 16:40:47.510 26839 26900 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:667: Successfully set remote user thread priority to 192 and stack size to 17408 for domain 3
11-18 16:40:47.510 26839 26900 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/listener_android.c:116: listener thread starting

3)QNN Skel / LLaMA package / DSP libc++ 加载过程

11-18 16:25:34.470 26055 26085 I demo_qwen2_vl_npu: vendor/qcom/proprietary/adsprpc/src/mod_table.c:701: open_mod_table_open_from_static: reverse module apps_std opened with handle 0xf507add8 (idx 0)
11-18 16:25:34.470 26055 26085 D demo_qwen2_vl_npu: ... apps_std_fopen_fd: done for /data/local/tmp/mllm/qnn-lib/cdsp/./libQnnHtpV75Skel.so ...
11-18 16:25:34.478 26055 26085 D demo_qwen2_vl_npu: ... apps_std_fopen_fd: done for /data/local/tmp/mllm/qnn-lib/./libQnnHtpV75Skel.so ...
11-18 16:25:34.479 26055 26085 I demo_qwen2_vl_npu: ... Successfully opened file /data/local/tmp/mllm/qnn-lib/./libQnnHtpV75Skel.so
...
11-18 16:40:47.537 26839 26900 W demo_qwen2_vl_npu: ... apps_std_fopen_with_env_fd failed with 0xd for path /vendor/dsp/cdsp/./libc++.so.1 name ./libc++.so.1 (No such file or directory)
11-18 16:40:47.538 26839 26900 E demo_qwen2_vl_npu: ... Error 0xd: open_mod_table_handle_invoke failed for handle:0x71cf9dd8, sc:0x1f050100
11-18 16:40:47.540  2092  2105 I cdsprpcd: ... Successfully opened file /vendor/dsp/cdsp/libc++.so.1
...
11-18 16:40:47.553 26839 26900 W demo_qwen2_vl_npu: ... apps_std_fopen_with_env_fd failed with 0xd for path /vendor/dsp/cdsp/./libc++abi.so.1 ...
11-18 16:40:47.557  2092  2105 I cdsprpcd: ... Successfully opened file /vendor/dsp/cdsp/libc++abi.so.1

4)QOS 设置与 dspqueue 创建


11-18 16:25:34.546 26055 26055 I demo_qwen2_vl_npu: ... remote_handle64_open: opened handle ... for file:///libQnnHtpV75Skel.so?qnn_skel_handle_invoke&_dom=cdsp ...
11-18 16:25:34.558 26055 26055 I demo_qwen2_vl_npu: ... remote_handle64_open: opened handle ... for file:///libdspqueue_rpc_skel.so?dspqueue_rpc_skel_handle_invoke...
11-18 16:25:34.560 26055 26055 I demo_qwen2_vl_npu: ... dspqueue_create: created Queue 0, ... for domain 3
...
11-18 16:40:48.205 26839 26839 I demo_qwen2_vl_npu: ... remote_handle_control_domain: requested QOS 1, latency 100 for domain 3 handle ...
11-18 16:40:48.205 26839 26839 I demo_qwen2_vl_npu: ... remote_handle_control_domain: requested QOS 3, latency 9999 for domain 3 handle ...

其他补充(Extra Notes) • 同一台红魔 8 Gen3 平板上,纯 CPU 路线推理 qwen2-vl-2b 输出正常,可排除 tokenizer/vocab/CPU decode 逻辑的问题; • 目前只有在 QNN NPU 路线 下出现: • prefill 能完成、性能正常; • 输出却是乱码文本; • 最终进程 Aborted。

970814 avatar Nov 18 '25 09:11 970814