mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Support Qwen2-MoE Architecture

Open Hzfengsy opened this issue 10 months ago • 6 comments

Qwen1.5-MoE-A2.7B-Chat is an open-sourced 14B MoE model, based on Qwen2-MoE architecture. It is possible to run on mobile devices.

Note that we need to support multi-device TP for the arch for future large models. However, I fail to do that because of limited devices.

cc @vinx13 @tqchen

Hzfengsy avatar Apr 05 '24 12:04 Hzfengsy

@DiegoCao can you help to followup and add TP support?

tqchen avatar Apr 05 '24 12:04 tqchen

depends on https://github.com/apache/tvm/pull/16848, waiting for the next sync

Hzfengsy avatar Apr 05 '24 15:04 Hzfengsy

Just rebased mlc-ai/relax. Let's trigger the CI tomorrow.

MasterJH5574 avatar Apr 05 '24 18:04 MasterJH5574

Got it, working on the TP support

DiegoCao avatar Apr 07 '24 04:04 DiegoCao

Waiting for dependencies: https://github.com/apache/tvm/pull/16887 and https://github.com/apache/tvm/pull/16886

BTW, there is a known numerical issue on Vulkan. Will fix it in a follow-up PR.

Hzfengsy avatar Apr 16 '24 07:04 Hzfengsy

@Hzfengsy seems alll deps are landed, let us followup

tqchen avatar Apr 29 '24 13:04 tqchen