MNN Qwen2-vl-2b和Qwen2.5-vl-3b模型opencl推理llm部分，首次推理正确，再次推理结果都是感叹号！！！！！

前提准备，根据文档中给出的模型链接下载Qwen2.5-VL-3B-Instruct-MNN模型和Qwen2-VL-2B-Instruct-MNN模型：测试图片demo.jpeg 测试prompt： config.json配置信息 { "llm_model": "llm.mnn", "llm_weight": "llm.mnn.weight", "backend_type": "opencl", "thread_num": 64, "precision": "low", "memory": "low" }

结果信息(多次执行，可以很明显的看出llm输出的结果不稳定，一会结果正确，一会结果不正确，但就推理而言这很异常)：第一次运行(结果正确)： (base) yyds@yyds:~/Codes/work/MNN/build $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json Can't open file:tmp/mnn_cachefile.bin Load Cache file error. tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1587.463013 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.006000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt The image depicts a serene beach scene during sunset. A person is sitting on the sand, wearing a plaid shirt and a hooded jacket. The person is holding a smartphone in front of them, possibly taking a photo or recording a video. A dog, which appears to be a large breed, is sitting next to the person, looking at the phone. The background shows the ocean with waves crashing onto the shore, and the sky is filled with warm hues of orange and pink, indicating that it is either sunrise or sunset. The overall atmosphere of the image is calm and peaceful.

################################# prompt tokens num = 239 decode tokens num = 118 vision time = 2.16 s audio time = 0.00 s prefill time = 0.59 s decode time = 1.11 s sample time = 0.59 s prefill speed = 403.19 tok/s decode speed = 105.85 tok/s ################################## Update cache to tmp/mnn_cachefile.bin, size = 3044016 Open tmp/mnn_cachefile.bin error Write Cache File error!

第二次运行(啥也没改，结果错误) (base) yyds@yyds:~/Codes/work/MNN/build $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json Can't open file:tmp/mnn_cachefile.bin Load Cache file error. tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1603.531982 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.005000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ################################# prompt tokens num = 239 decode tokens num = 512 vision time = 2.16 s audio time = 0.00 s prefill time = 0.63 s decode time = 5.05 s sample time = 3.00 s prefill speed = 377.80 tok/s decode speed = 101.37 tok/s ################################## Update cache to tmp/mnn_cachefile.bin, size = 3044016 Open tmp/mnn_cachefile.bin error Write Cache File error!

第三次运行(结果正确) (base) yyds@yyds:~/Codes/work/MNN/build $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json Can't open file:tmp/mnn_cachefile.bin Load Cache file error. tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1590.616943 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.005000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt The image depicts a serene beach scene at sunset. A person is sitting on the sand, wearing a plaid shirt and a hooded jacket. The person is holding a smartphone in front of them, possibly taking a photo or recording a video. A dog, which appears to be a large breed, is sitting next to the person, looking at the phone. The background shows the ocean with waves crashing onto the shore, and the sky is filled with warm hues of orange and pink, indicating that it is either sunrise or sunset. The overall atmosphere of the image is calm and peaceful.

################################# prompt tokens num = 239 decode tokens num = 118 vision time = 2.17 s audio time = 0.00 s prefill time = 0.59 s decode time = 1.08 s sample time = 0.56 s prefill speed = 401.81 tok/s decode speed = 109.38 tok/s ################################## Update cache to tmp/mnn_cachefile.bin, size = 3044016 Open tmp/mnn_cachefile.bin error Write Cache File error!

后面又试了几次，有的对有的错，这现象较异常，我打印了一下视觉模型输出的结果，一直很稳定，定位到llm部分使用opencl后端推理导致的。备注：llm部分使用cpu推理无异常。

Apr 07 '25 07:04 CYYAI

是最新代码测试么？看提交应该已经解决了

Apr 08 '25 11:04 jxt1234

看下你的 mnn 代码有没包含 7391896be30eb2cd21a4eceb97329a2c118dd8b3 这个提交

Apr 08 '25 11:04 jxt1234

看下你的 mnn 代码有没包含 7391896 这个提交

您好，我的是最新的代码，是含有的这个提交的

Apr 09 '25 02:04 CYYAI

https://github.com/alibaba/MNN/pull/3375

这个提交修正了

Apr 10 '25 12:04 jxt1234

#3375

这个提交修正了

您好，拉取最新代码(修正后的)，多次运行问题仍然存在：

另外，发现新问题，视觉部分设置opencl后端后没起作用，运行时间2.16s，这是用了cpu后端？

配置文件如下

{ "llm_model": "llm.mnn", "llm_weight": "llm.mnn.weight", "backend_type": "opencl", "thread_num": 4, "precision": "low", "memory": "low", "mllm": { "backend_type": "opencl", "thread_num": 4, "precision": "low", "memory": "low" } }

运行输出日志如下

(base) yyds@yyds:~/Codes/work/MNN/build $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json Can't open file:tmp/mnn_cachefile.bin Load Cache file error. tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1677.161011 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.005000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt The image depicts a serene beach scene at sunset. A person is sitting on the sand, wearing a plaid shirt and a hooded jacket. The person is holding a smartphone in front of them, possibly taking a photo or recording a video. A dog, which appears to be a large breed, is sitting next to the person, looking at the phone. The background shows the ocean with waves crashing onto the shore, and the sky is filled with warm hues of orange and pink, indicating that it is either sunrise or sunset. The overall atmosphere of the image is calm and peaceful.

################################# prompt tokens num = 239 decode tokens num = 118 vision time = 2.18 s audio time = 0.00 s prefill time = 0.60 s decode time = 1.17 s sample time = 0.87 s prefill speed = 396.02 tok/s decode speed = 100.68 tok/s ################################## Update cache to tmp/mnn_cachefile.bin, size = 3015836 Open tmp/mnn_cachefile.bin error Write Cache File error! (base) yyds@yyds:~/Codes/work/MNN/build $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json Can't open file:tmp/mnn_cachefile.bin Load Cache file error. tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1663.178955 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.005000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ################################# prompt tokens num = 239 decode tokens num = 512 vision time = 2.16 s audio time = 0.00 s prefill time = 0.61 s decode time = 5.05 s sample time = 3.17 s prefill speed = 394.14 tok/s decode speed = 101.35 tok/s ################################## Update cache to tmp/mnn_cachefile.bin, size = 3015836 Open tmp/mnn_cachefile.bin error Write Cache File error!

Apr 11 '25 02:04 CYYAI

建 tmp 文件的话，后续推理正确么？看着每次都没有缓存

Apr 14 '25 11:04 jxt1234

建 tmp 文件的话，后续推理正确么？看着每次都没有缓存

您好，建立tmp文件也不行，这个问题很好复现的，视觉结果没问题(已经验证)，就是llm部分时不时出现!!!!!!错误，llm部分使用cpu推理不会出现这个错误，感觉就是opencl推理引擎的事，这个是最新拉取的代码测试的，要不你们测一测？

(base) yyds@yyds:~/Codes/work/MNN $ /home/yyds/Codes/work/MNN/build/llm_demo /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json /media/yyds/cyy2t/move/prompt.txt CPU Group: [ 20 21 31 23 25 17 27 19 29 30 22 28 24 18 16 26 ], 800000 - 4300000 CPU Group: [ 14 6 13 1 15 3 4 5 2 7 12 0 ], 800000 - 5500000 CPU Group: [ 10 11 9 8 ], 800000 - 5800000 The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0, sme2: 0 config path is /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/config.json tokenizer_type = 3 Sampler: greedy load /media/yyds/cyy2t/move/Qwen2.5-VL-3B-Instruct-MNN/llm.mnn ... Load Module Done! Clone Decode Module Done! main, 222, cost time: 1682.418945 ms Prepare for tuning opt Begin Prepare for tuning opt End main, 226, cost time: 0.006000 ms prompt file is /media/yyds/cyy2t/move/prompt.txt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ################################# prompt tokens num = 239 decode tokens num = 512 vision time = 0.17 s audio time = 0.00 s prefill time = 0.29 s decode time = 5.20 s sample time = 3.35 s prefill speed = 827.99 tok/s decode speed = 98.49 tok/s ##################################

Apr 15 '25 02:04 CYYAI

你这边是用的gpu是什么的，我们复现一下。

Apr 15 '25 02:04 Qxinyu

你这边是用的gpu是什么的，我们复现一下。按照上面的图片，prompt，模型使用你们官方提供的Qwen2.5-vl-3b的模型就行，编译后，多次运行这个问题就出现了。编译命令： cmake -DMNN_LOW_MEMORY=true
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true
-DMNN_BUILD_LLM=true
-DMNN_SUPPORT_TRANSFORMER_FUSE=true
-DLLM_SUPPORT_VISION=true
-DMNN_BUILD_OPENCV=true
-DMNN_IMGCODECS=true
-DMNN_BUILD_CONVERTER=true
-DMNN_OPENCL=true
-DMNN_AVX512=true
-DMNN_USE_SYSTEM_LIB=true
-DMNN_SEP_BUILD=false
-DMNN_CUDA=true ..

环境配置：系统：Ubuntu 20.04.6 LTS 显卡：NVIDIA GeForce RTX 4070 cuda版本：cuda_11.8

Apr 15 '25 02:04 CYYAI

建 tmp 文件的话，后续推理正确么？看着每次都没有缓存

Update cache to tmp/mnn_cachefile.bin, size = 3015836 Open tmp/mnn_cachefile.bin error Write Cache File error!

这个文件是否存在貌似对性能有比较大的影响？我这边测试的时候vulkan上看到prefill差异很大

Apr 29 '25 08:04 andyt9527

建 tmp 文件的话，后续推理正确么？看着每次都没有缓存

Update cache to tmp/mnn_cachefile.bin, size = 3015836 Open tmp/mnn_cachefile.bin error Write Cache File error!

这个文件是否存在貌似对性能有比较大的影响？我这边测试的时候vulkan上看到prefill差异很大

这个文件会存储运行的kernel二进制和local size的设置，第一次生成的时候会很慢，后面通过缓存启动会变快。

Apr 29 '25 08:04 Qxinyu

这个是 NVIDIA 上 opencl softmax 算子的兼容性，已经修正了，可以更新代码再测试下

May 03 '25 07:05 jxt1234

建 tmp 文件的话，后续推理正确么？看着每次都没有缓存

Update cache to tmp/mnn_cachefile.bin, size = 3015836 Open tmp/mnn_cachefile.bin error Write Cache File error! 这个文件是否存在貌似对性能有比较大的影响？我这边测试的时候vulkan上看到prefill差异很大

这个文件会存储运行的kernel二进制和local size的设置，第一次生成的时候会很慢，后面通过缓存启动会变快。

我看到在同一个环境下多次生成不同的tempcache文件，性能是不一致的，这个是什么原因呢？

May 06 '25 09:05 andyt9527