MoE-Infinity
MoE-Infinity copied to clipboard
feat: performance improvement and Qwen3 support
Description
Major changes for performance improvement
Motivation
- Support latest QWen3 MoE model
- Overlap hidden states gather with expert copy
- Reduce torch kernel launch overhead
Type of Change
- [ ] Bug fix
- [x] New feature
- [x] Breaking change
- [x] Documentation update
Checklist
- [x] I have read the CONTRIBUTION guide.
- [ ] I have updated the tests (if applicable).
- [x] I have updated the documentation (if applicable).