mllm icon indicating copy to clipboard operation
mllm copied to clipboard

v2版本什么时候ready呢

Open zjd1988 opened this issue 3 months ago • 10 comments

我看提交记录目前依然在快速迭代中,想问下v2 ready的时间大概是什么时候呢?

zjd1988 avatar Sep 17 '25 01:09 zjd1988

感谢您的关注。v2的主体功能已经stable了。目前已经能完整推理qwen2_5vl和qwen2vl。当前主要的工作在完善算子库、支持QNN后端、支持新的模型上。可能需要到10月中可以基本ready。

chenghuaWang avatar Sep 17 '25 01:09 chenghuaWang

@chenghuaWang 好的,非常期待后续应用到实际项目,我先熟悉熟悉你们的代码。

zjd1988 avatar Sep 18 '25 02:09 zjd1988

@chenghuaWang 你好,v2有没有跟其他方案做性能对比,比如MNN和llama.cpp呢?

zjd1988 avatar Sep 23 '25 09:09 zjd1988

@chenghuaWang 你好,v2有没有跟其他方案做性能对比,比如MNN和llama.cpp呢?

已经有同学在做了,会对比llama.cpp, MNN 等框架。

chenghuaWang avatar Sep 23 '25 09:09 chenghuaWang

FileNotFoundError: [Errno 2] No such file or directory: '/root/mllm/mllm/Backends/QNN/Kernels/MllmPackage'

v2分支报错 mac执行python task.py tasks/build_android_qnn.yaml

zhenchong avatar Sep 23 '25 09:09 zhenchong

v2 还在积极开发阶段,QNN 目前还没迁移过来,还需要点时间,您可以运行 CPU 版本的体验下。

chenghuaWang avatar Sep 23 '25 09:09 chenghuaWang

v2 ubuntu 22 x86 mllm-convertor --input_path /home/ubuntu/work/model/model.safetensors --output_path /home/ubuntu/work/model/w4a32.mllm --model_name "Qwen3-0.6B" --cfg_path /home/ubuntu/work/model/quant_cfg_0.6B_w4a32_kai.json --pipeline w4a32_kai_pipeline --format v2

Params Num: Before: 311, After: 312 Traceback (most recent call last): File "/home/ubuntu/.local/bin/mllm-convertor", line 7, in sys.exit(main()) File "/home/ubuntu/.local/lib/python3.10/site-packages/pymllm/utils/mllm_convertor.py", line 67, in main pipeline.stream_quantize( File "/home/ubuntu/.local/lib/python3.10/site-packages/pymllm/quantize/solver.py", line 137, in stream_quantize prepared_payload = pass_.run( File "/home/ubuntu/.local/lib/python3.10/site-packages/pymllm/quantize/kai/w4a32.py", line 89, in run weight: Tensor = tvm_ffi.get_global_func( File "/home/ubuntu/.local/lib/python3.10/site-packages/tvm_ffi/registry.py", line 173, in get_global_func return core._get_global_func(name, allow_missing) File "python/tvm_ffi/cython/function.pxi", line 709, in core._get_global_func ValueError: Cannot find global function mllm.quantize_pack.KaiLinear_f32_qai8dxp_qsi4c32p_mxk_nxk @chenghuaWang

zhenchong avatar Sep 29 '25 02:09 zhenchong

The w4a32_kai_pipeline is designed specifically for ARM-based devices and does not support X86 architectures. Currently, only ARM devices (such as macOS systems with Apple Silicon chips and Android phones) can utilize this pipeline.

For pre-quantized models, please visit: https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/files

We plan to make this pipeline available for X86 architectures in the future, though this adaptation will introduce some overhead on the ARM side (due to packing requirements). We would greatly appreciate it if you could contribute to this feature by submitting a pull request.

chenghuaWang avatar Sep 29 '25 05:09 chenghuaWang

The w4a32_kai_pipeline is designed specifically for ARM-based devices and does not support X86 architectures. Currently, only ARM devices (such as macOS systems with Apple Silicon chips and Android phones) can utilize this pipeline.

For pre-quantized models, please visit: https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/files

We plan to make this pipeline available for X86 architectures in the future, though this adaptation will introduce some overhead on the ARM side (due to packing requirements). We would greatly appreciate it if you could contribute to this feature by submitting a pull request. @chenghuaWang 使用https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/file,模型,android cpu V2分支加载模型,感觉答非所问 Generating text for prompt: 介绍一下北京 Qwen3-JNI com.mllm.demo I Text generation completed: 我们很高兴认识,可以为你提供各种信息和建议,帮助你解决问题,比如学习,或者生活中的各种问题都可以解决!

zhenchong avatar Sep 30 '25 08:09 zhenchong

Qwen3's ModelScope model can be used in the command line interface (CLI) as follows:

./mllm-qwen3-runner -m ./Qwen3-0.6B-w4a32kai/model.mllm -mv v2 -c ./Qwen3-0.6B-w4a32kai/config.json -t ./Qwen3-0.6B-w4a32kai/tokenizer.json

However, Qwen3's ModelScope model is not compatible with Version 1, for the following reasons:

  1. We have adopted a faster quantization configuration (KAI), which is not supported in Version 1.
  2. The model format of Version 2 differs slightly from that of Version 1 (with additional shape size information).

Currently, Version 2 is still under active development, and you can test it via the command line. Regarding applications and demos, Version 2 comes with an API Server, and our team is already working on it. Once ready, you will be able to use mllm with any chatbox-like applications.

chenghuaWang avatar Sep 30 '25 10:09 chenghuaWang