v2版本什么时候ready呢
我看提交记录目前依然在快速迭代中,想问下v2 ready的时间大概是什么时候呢?
感谢您的关注。v2的主体功能已经stable了。目前已经能完整推理qwen2_5vl和qwen2vl。当前主要的工作在完善算子库、支持QNN后端、支持新的模型上。可能需要到10月中可以基本ready。
@chenghuaWang 好的,非常期待后续应用到实际项目,我先熟悉熟悉你们的代码。
@chenghuaWang 你好,v2有没有跟其他方案做性能对比,比如MNN和llama.cpp呢?
FileNotFoundError: [Errno 2] No such file or directory: '/root/mllm/mllm/Backends/QNN/Kernels/MllmPackage'
v2分支报错 mac执行python task.py tasks/build_android_qnn.yaml
v2 还在积极开发阶段,QNN 目前还没迁移过来,还需要点时间,您可以运行 CPU 版本的体验下。
v2 ubuntu 22 x86 mllm-convertor --input_path /home/ubuntu/work/model/model.safetensors --output_path /home/ubuntu/work/model/w4a32.mllm --model_name "Qwen3-0.6B" --cfg_path /home/ubuntu/work/model/quant_cfg_0.6B_w4a32_kai.json --pipeline w4a32_kai_pipeline --format v2
Params Num: Before: 311, After: 312
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/mllm-convertor", line 7, in
The w4a32_kai_pipeline is designed specifically for ARM-based devices and does not support X86 architectures. Currently, only ARM devices (such as macOS systems with Apple Silicon chips and Android phones) can utilize this pipeline.
For pre-quantized models, please visit: https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/files
We plan to make this pipeline available for X86 architectures in the future, though this adaptation will introduce some overhead on the ARM side (due to packing requirements). We would greatly appreciate it if you could contribute to this feature by submitting a pull request.
The w4a32_kai_pipeline is designed specifically for ARM-based devices and does not support X86 architectures. Currently, only ARM devices (such as macOS systems with Apple Silicon chips and Android phones) can utilize this pipeline.
For pre-quantized models, please visit: https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/files
We plan to make this pipeline available for X86 architectures in the future, though this adaptation will introduce some overhead on the ARM side (due to packing requirements). We would greatly appreciate it if you could contribute to this feature by submitting a pull request. @chenghuaWang 使用https://www.modelscope.cn/models/mllmTeam/Qwen3-0.6B-w4a32kai/file,模型,android cpu V2分支加载模型,感觉答非所问 Generating text for prompt: 介绍一下北京 Qwen3-JNI com.mllm.demo I Text generation completed: 我们很高兴认识,可以为你提供各种信息和建议,帮助你解决问题,比如学习,或者生活中的各种问题都可以解决!
Qwen3's ModelScope model can be used in the command line interface (CLI) as follows:
./mllm-qwen3-runner -m ./Qwen3-0.6B-w4a32kai/model.mllm -mv v2 -c ./Qwen3-0.6B-w4a32kai/config.json -t ./Qwen3-0.6B-w4a32kai/tokenizer.json
However, Qwen3's ModelScope model is not compatible with Version 1, for the following reasons:
- We have adopted a faster quantization configuration (KAI), which is not supported in Version 1.
- The model format of Version 2 differs slightly from that of Version 1 (with additional shape size information).
Currently, Version 2 is still under active development, and you can test it via the command line. Regarding applications and demos, Version 2 comes with an API Server, and our team is already working on it. Once ready, you will be able to use mllm with any chatbox-like applications.