sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Feature] Add a FP8 Gemm backend for choosing FP8 gemm kernel

Open Fridge003 opened this issue 1 month ago • 3 comments

Checklist

  • [ ] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • [ ] Please use English. Otherwise, it will be closed.

Motivation

Currently in SGLang, the FP8 Gemm kernels we use is controlled by a series of environment variables or implicitly dispatching logics, as in https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/quantization/fp8_utils.py#L151

To make a better control, we need a server argument like --fp8-gemm-runner-backend, similar to --moe-runner-backend

Related resources

No response

Fridge003 avatar Nov 22 '25 20:11 Fridge003

Will be tackling this

raayandhar avatar Nov 22 '25 20:11 raayandhar

@b8zhong I saw you self-assigned, am I still good to work on this?

raayandhar avatar Nov 22 '25 21:11 raayandhar

@b8zhong I saw you self-assigned, am I still good to work on this?

@b8zhong will work on this. Thanks anyway

Fridge003 avatar Nov 22 '25 21:11 Fridge003