Ma Mingfei

Results 93 comments of Ma Mingfei

@dbyoung18 does this one support int4 woq ?

@zhyncs need to reopen this one. We are currently working on an internal branch to make sure everything is ready and then we will start upstream to sglang main branch....

comment to keep this thread active, optimization work pretty much done internally.

upstream the C++ kernels on https://github.com/sgl-project/sglang/pull/5150

update using CMakeLists.txt: https://github.com/sgl-project/sglang/pull/6115

fp8 gemm: https://github.com/sgl-project/sglang/pull/6216

enable intel amx attention backend: replace https://github.com/sgl-project/sglang/pull/6143 with https://github.com/sgl-project/sglang/pull/6405 https://github.com/sgl-project/sglang/pull/6408

add fp8 shared moe kernels https://github.com/sgl-project/sglang/pull/6339 the shared moe kernels is an innovation that we have done on cpu backend, brings pretty good performance speedup for decoding when concurrency is...

add fp8 support for existing fused moe kernels: https://github.com/sgl-project/sglang/pull/6404

Add docker build: https://github.com/sgl-project/sglang/pull/6458