sglang
sglang copied to clipboard
[Feature] Add a FP8 Gemm backend for choosing FP8 gemm kernel
Checklist
- [ ] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.
Motivation
Currently in SGLang, the FP8 Gemm kernels we use is controlled by a series of environment variables or implicitly dispatching logics, as in https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/quantization/fp8_utils.py#L151
To make a better control, we need a server argument like --fp8-gemm-runner-backend, similar to --moe-runner-backend
Related resources
No response
Will be tackling this
@b8zhong I saw you self-assigned, am I still good to work on this?
@b8zhong I saw you self-assigned, am I still good to work on this?
@b8zhong will work on this. Thanks anyway