MoE-Infinity
MoE-Infinity copied to clipboard
feat: Merge kernels from vLLM and FlashInfer
Description
Fuse MoE layer kernels
Motivation
Kernel launch overhead too large
Type of Change
- [ ] Bug fix
- [x] New feature
- [x] Breaking change
- [x] Documentation update
Checklist
- [x] I have read the CONTRIBUTION guide.
- [x] I have updated the tests (if applicable).
- [x] I have updated the documentation (if applicable).