mlc-llm
mlc-llm copied to clipboard
Support Qwen2-MoE Architecture
Qwen1.5-MoE-A2.7B-Chat is an open-sourced 14B MoE model, based on Qwen2-MoE architecture. It is possible to run on mobile devices.
Note that we need to support multi-device TP for the arch for future large models. However, I fail to do that because of limited devices.
cc @vinx13 @tqchen
@DiegoCao can you help to followup and add TP support?
depends on https://github.com/apache/tvm/pull/16848, waiting for the next sync
Just rebased mlc-ai/relax. Let's trigger the CI tomorrow.
Got it, working on the TP support
Waiting for dependencies: https://github.com/apache/tvm/pull/16887 and https://github.com/apache/tvm/pull/16886
BTW, there is a known numerical issue on Vulkan. Will fix it in a follow-up PR.
@Hzfengsy seems alll deps are landed, let us followup