MNN QWen3-0.6B NPU 部署，Convert for QNN, QualComn's NPU，Broad cast error

参照文档部署Qwen3-0.6B，将模型转换qnn后端 python3 npu/generate_llm_qnn.py --model model --soc_id=57 --dsp_arch=v75时遇到如下报错：

Step1: Make IO                                                                                                                             
blockSize=128 in main, 148                                                                                                                                                                                                                                                            
modelPath.c_str()=s model/llm.mnn in main, 152                                                                                                                                                                                                                                        
llmConfigPath.c_str()=s model/llm_config.json in main, 153                                                                                                                                                                                                                            
CPU Group: [ 20  21  23  17  19  22  16  18 ], 800000 - 4100000                                                                                                                                                                                                                       
CPU Group: [ 14  13  6  1  15  3  5  4  7  12  0  2 ], 800000 - 5100000                                                                    
CPU Group: [ 10  11  9  8 ], 800000 - 5200000                                                                                                                                                                                                                                         
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0                                                                           
Successfully generate tmp/testdir/128/input.mnn and tmp/testdir/128/output.mnn.                                                                                                                                                                                                       
Successfully generate tmp/testdir/1/input.mnn and tmp/testdir/1/output.mnn.                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                      
Cost:  2.0691800117492676  s                                                                                                               
Step2: Seperate Model                                                                                                                                                                                                                                                                 
model: /home/user/workspace/aiinfracompile/MNN/transformers/llm/export/model/llm.mnn                                                                                                                                                                                                
Convert for QNN, QualComn's NPU                                                                                                                                                                                                                                                       
gCacheDir.c_str()=s qnn in main, 884                                                                                                                                                                                                                                                  
CPU Group: [ 20  21  23  17  19  22  16  18 ], 800000 - 4100000                                                                                                                                                                                                                       
CPU Group: [ 14  13  6  1  15  3  5  4  7  12  0  2 ], 800000 - 5100000                                                                                                                                                                                                               
CPU Group: [ 10  11  9  8 ], 800000 - 5200000                                                                                                                                                                                                                                         
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0                                                                                                                                                                                                                      
[Warning]: No QnnDevice_getPlatformInfo APILoad Cache file error.                                                                                                                                                                                                                     
Load Cache file error.                                                                                                                     
Broad cast error, dim1 = 1024, dim2 = 0                                                                                                    
Compute Shape Error for /Add_3_output_0                                                                                                                                                                                                                                               
Load Cache file error.                                                                                                                                                                                                                                                                
Broad cast error, dim1 = 1024, dim2 = 0                                                                                                                                                                                                                                               
Compute Shape Error for /Add_8_output_0                                                                                                    
Load Cache file error.                                                                                                                                                                                                                                                                
Broad cast error, dim1 = 1024, dim2 = 0                                                                                                                                                                                                                                               
Compute Shape Error for /Add_13_output_0                                                                                                                                                                                                                                              
Load Cache file error.                                                                                                                                                                                                                                                                
Broad cast error, dim1 = 1024, dim2 = 0                              
Compute Shape Error for /Add_18_output_0                                                                                                   
Load Cache file error.

Nov 14 '25 06:11 yinrun

如果忽略这个问题，直接部署会遇到新问题，看上去是一些子图没法正常执行

manet:/data/local/tmp/MNN # ./llm_demo model/config_qnn.json
Can't open file:/sys/devices/system/cpu/cpufreq/boost/affected_cpus
CPU Group: [ 0  1 ], 364800 - 2265600
CPU Group: [ 5  6 ], 499200 - 2956800
CPU Group: [ 2  3  4 ], 499200 - 3148800
CPU Group: [ 7 ], 480000 - 3302400
(last_midr & (CPUINFO_ARM_MIDR_IMPLEMENTER_MASK | CPUINFO_ARM_MIDR_PART_MASK))=0x 4100d820 in _getInfoArm, 1234 
The device supports: i8sdot:1, fp16:1, i8mm: 1, sve2: 0, sme2: 0
config path is model/config_qnn.json
main, 266, cost time: 393.181000 ms
Prepare for tuning opt Begin
Prepare for tuning opt End
main, 274, cost time: 3016.519043 ms

User: test

A: Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
Error in file /home/yinrun/workspace/aiinfracompile/MNN/source/backend/qnn/backend/QNNBackend.cpp, line 888: error code 1003
!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!%!^C

Nov 14 '25 08:11 yinrun

这个问题有解决吗

Nov 25 '25 03:11 zx104972

这个问题有解决吗

没有，没找到原因

Nov 26 '25 06:11 yinrun

你好请问有遇到过这个问题吗

12-05 18:31:34.428 31510 31510 E MNNJNI : Compute Shape Error for qnn/graph0.bin

Dec 05 '25 10:12 zx104972

现在Qwen3-0.6B在8gen3的设备上会有这个问题，在8gen5上正常，这个原因我们暂时也不太清楚。

Dec 08 '25 02:12 Qxinyu

我在将qwen3-4b，qwen3-1.7b模型编译的时候遇到了相同的问题，我是编译成elite版本（--soc_id=69 --dsp_arch=v79），请问现在有办法解决吗？如果我忽略这个转换问题，直接在手机上运行llm_demo会出现segment_fault

Dec 11 '25 08:12 TheLogan6

现在Qwen3-0.6B在8gen3的设备上会有这个问题，在8gen5上正常，这个原因我们暂时也不太清楚。

我尝试的8gen5也不行

Dec 13 '25 07:12 yinrun

现在Qwen3-0.6B在8gen3的设备上会有这个问题，在8gen5上正常，这个原因我们暂时也不太清楚。

我尝试的8gen5也不行

你在导出模型的时候有打开--seperate_embed, 在执行阶段需要embeddings_bf16.bin文件。

Dec 15 '25 01:12 Qxinyu