hahahaahaa

Results 1 issues of hahahaahaa

Hello :) I’d like to use Tutel as the MoE layer implementation in Nanotron to train a **Qwen3-MoE** 15B model from scratch with 128 experts and top-k = 8. Cluster...