Plans for support of Qwen models?
Are there plans to support of Qwen models in MaxText?
Aside from Gemma: Deepseek, Mistral and Llama are there, but Qwen seems to be missing.
Hi, yes, we are looking into this. Are there particular Qwen variants you would want to see in MaxText ? Any context or motivation would be helpful.
Awesome :) Qwen 3-4B and 8B would be my primary interest. I'm planning to use as a backbone for a speech model, through TPUs provided by the TRC. Bigger models would also be interesting, but will be too hefty to fine-tune.
Yes, it is quite strange that Qwen is not included in the release, as it is a fairly popular and widely cited model. I would particularly like to see the latest Qwen3 and Qwen 3 MoE.