torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

[WIP] Qwen3 MoE support

Open intervitens opened this issue 6 months ago • 4 comments

This PR adds support for Qwen3 MoE (30B-A3B and 235B-A22B) models. Loss looked reasonable from a simple test with 30B-A3B on the Alpaca dataset.

TODO:

  • [ ] Tensor/Expert parallel
  • [x] Test 235B model
  • [x] Verify loss curves against HF implementation
  • [x] LoRA support
  • [ ] Documentation
  • [ ] Tests

intervitens avatar Jun 12 '25 06:06 intervitens

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2820

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Jun 12 '25 06:06 pytorch-bot[bot]

@intervitens Hi,Does it support training with fp8-compatible checkpoints?

dz1iang avatar Aug 06 '25 03:08 dz1iang

Hello, I see the training process prompts: "Saving Qwen3 MoE adapter weights to PEFT format is not supported, saving to torchtune format instead." May I ask how to obtain the Hugging Face (hf) checkpoint? Is there any code example for reference? Additionally, I only set lora_attn_modules: ['q_proj', 'v_proj', 'output_proj'], apply_lora_to_mlp: False, and apply_lora_to_output: False—can the tune_to_peft_adapter_weights logic be used in this case? Thank you.

dz1iang avatar Aug 13 '25 10:08 dz1iang

Why not merge this in?

cinjon avatar Nov 18 '25 00:11 cinjon

Has anyone had success compiling the moe here and training?

cinjon avatar Dec 04 '25 22:12 cinjon