Huapeng Zhou comments

Results 11 comments of


                                            Huapeng Zhou

[Feature] grafana dashboard should work out of the box

I will take that together!

[Feature] grafana dashboard should work out of the box

Fixed by PR: https://github.com/sgl-project/sglang/pull/4718

[Feat] Support FlashMLA backend with MTP and FP8 KV cache

> Hi @quinnrong94 , can you take a look at this CI fail? https://github.com/sgl-project/sglang/actions/runs/14996032913/job/42130798605?pr=6109 Hi @Fridge003 , I saw flashMLA test failed in CI, I wonder if it's due to...

[Feat] Enable PDL automatically on Hopper architecture

> Also please provide the performance benchmark after this enhancement Yes, there is another guy who is testing the performance!

[Feat] Enable PDL automatically on Hopper architecture

Still working bro > > > Also please provide the performance benchmark after this enhancement > > > > > > Yes, there is another guy who is testing the...

[Feat] Enable PDL automatically on Hopper architecture

Here is my benchmark for testing(test on H100): command: python3 -m sglang.bench_one_batch --model-path meta-llama/Llama-3.1-8B-Instruct --attention-backend fa3 --batch 16 --input-len 1024 --output-len 10 Before this PR: After: Thanks @Fridge003 for helping!

Huapeng Zhou

[Feature] grafana dashboard should work out of the box

[Feature] grafana dashboard should work out of the box

[Feat] Support FlashMLA backend with MTP and FP8 KV cache

[Feat] Enable PDL automatically on Hopper architecture

[Feat] Enable PDL automatically on Hopper architecture

[Feat] Enable PDL automatically on Hopper architecture

[Feat] Enable PDL automatically on Hopper architecture

Added installation docs.

ITL metrics

ITL metrics