mlc-llm Support Qwen2-MoE Architecture

Support Qwen2-MoE Architecture

Open Hzfengsy opened this issue 10 months ago • 6 comments

Qwen1.5-MoE-A2.7B-Chat is an open-sourced 14B MoE model, based on Qwen2-MoE architecture. It is possible to run on mobile devices.

Note that we need to support multi-device TP for the arch for future large models. However, I fail to do that because of limited devices.

cc @vinx13 @tqchen

Apr 05 '24 12:04 Hzfengsy

@DiegoCao can you help to followup and add TP support?

Apr 05 '24 12:04 tqchen

depends on https://github.com/apache/tvm/pull/16848, waiting for the next sync

Apr 05 '24 15:04 Hzfengsy

Just rebased mlc-ai/relax. Let's trigger the CI tomorrow.

Apr 05 '24 18:04 MasterJH5574

Got it, working on the TP support

Apr 07 '24 04:04 DiegoCao

Waiting for dependencies: https://github.com/apache/tvm/pull/16887 and https://github.com/apache/tvm/pull/16886

BTW, there is a known numerical issue on Vulkan. Will fix it in a follow-up PR.

Apr 16 '24 07:04 Hzfengsy

@Hzfengsy seems alll deps are landed, let us followup

Apr 29 '24 13:04 tqchen