intervitens

Results 3 issues of intervitens

This PR adds a custom floating point quantization method powered by [TorchAO](https://github.com/pytorch/ao), which achieves a high throughput, thanks to the optimized [fp6_llm](https://github.com/usyd-fsalab/fp6_llm) kernel. Use `-q torchao --torchao-fp-bits 6` to load...

#### Context - [ ] add a new feature - [x] fix a bug - [ ] update tests and/or documentation - [ ] other (please add here) #2659 added...

CLA Signed

This PR adds support for Qwen3 MoE (30B-A3B and 235B-A22B) models. Loss looked reasonable from a simple test with 30B-A3B on the Alpaca dataset. TODO: - [ ] Tensor/Expert parallel...

CLA Signed