王召德 comments

Results 98 comments of


                                            王召德

Integrate kleidiAI release v0.1.0 into MNN 2.9.3

OK测试了一下对称量化的模型没有问题，decode性能相比MNN的原始实现有加速效果在M3 Pro上测试`Qwen2-1.5B-int4`， CPU 4线程速度如下： | | prefill | decode | |:--:|:------:|:-------:| | MNN | 330 | 75 | | KleidiAI | 295 | 85 |

qwen2-vl 转成 onnx 后，如何再通过 tensorrt 单独把 visual 部分转成 plan ？

> 如何再通过 tensorrt 单独把 visual 部分转成 plan 这个句话的意思不太明白？导出之后就会有`visual.onnx`模型的

qwen2-vl 转成 onnx 后，如何再通过 tensorrt 单独把 visual 部分转成 plan ？

目前还没有测试过

qwen2-vl 转成 onnx 后，如何再通过 tensorrt 单独把 visual 部分转成 plan ？

> > 我试了下现在可以转为 visual.onnx，请问下现在有试过转为 tensorrt 吗？ > > 我转出来的visual.onnx只有1.3M，然后有很多visual开头的文件，不知道怎么使用。你这边转出来是多大的文件？因为模型大小超过2GB，onnx会把权重和计算图单独存储

ModuleNotFoundError: No module named 'SwissArmyTransformer'

同样的问题，以下方法可以解决： - 指定版本安装`pip install SwissArmyTransformer==0.2.8`

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

you can add `--export_test` to verify the onnx model. `--export_test` will run onnx with onnxrumtime and compare with torch.

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

Can you convert large vocab size `lm.onnx` to `lm.mnn` ?

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

> This is the case with qwen model too. I am not sure if this is just json printing bug or if mnn process int8 quanized values again while model...

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

> Okay so Type 1 and Type 2 Quantization for Convolutions are based on Sparse vs Compressed blob size for quantized weights and I am guessing MNN has both the...

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

Can you give me the tinyllama_multilingual model url ? I'll test and debug for you.