Support Qwen3
Qwen3 only deletes bias and adds qk_norm (like in gemma3). So it should be straightforward.
Hi @pocca2048 thanks for creating the issue. It would be great to support this in torchtune. Do you have any interest in opening a PR? We would be happy to provide guidance on the implementation, you can take a look at the Qwen 2.5 PR as a reference: #1863
Unfortunately, I don’t have the time to work on this at the moment... 😢 If it’s still unclaimed when I have some availability, I’d be happy to give it a try.
I've created a draft PR with only the model builders (no recipes added yet): https://github.com/pytorch/torchtune/pull/2669
If someone can help review this, I can work on adding the recipes and verifying them. Just need some help to check if I'm on the right track with this.