Ma Mingfei
Ma Mingfei
@dbyoung18 does this one support int4 woq ?
@zhyncs need to reopen this one. We are currently working on an internal branch to make sure everything is ready and then we will start upstream to sglang main branch....
comment to keep this thread active, optimization work pretty much done internally.
upstream the C++ kernels on https://github.com/sgl-project/sglang/pull/5150
update using CMakeLists.txt: https://github.com/sgl-project/sglang/pull/6115
fp8 gemm: https://github.com/sgl-project/sglang/pull/6216
enable intel amx attention backend: replace https://github.com/sgl-project/sglang/pull/6143 with https://github.com/sgl-project/sglang/pull/6405 https://github.com/sgl-project/sglang/pull/6408
add fp8 shared moe kernels https://github.com/sgl-project/sglang/pull/6339 the shared moe kernels is an innovation that we have done on cpu backend, brings pretty good performance speedup for decoding when concurrency is...
add fp8 support for existing fused moe kernels: https://github.com/sgl-project/sglang/pull/6404
Add docker build: https://github.com/sgl-project/sglang/pull/6458