Baizhou Zhang

Results 79 comments of Baizhou Zhang

It's recommended to compare the performance between enabling and disabling MTP with the following script: ```bash python3 python/sglang/test/send_one.py ``` Please paste the results

@quinnrong94 The added tests cannot pass CI, please have a look

Hi @quinnrong94 , can you take a look at this CI fail? https://github.com/sgl-project/sglang/actions/runs/14996032913/job/42130798605?pr=6109

> > Hi @quinnrong94 , can you take a look at this CI fail? https://github.com/sgl-project/sglang/actions/runs/14996032913/job/42130798605?pr=6109 > > Hi @Fridge003 , I saw flashMLA test failed in CI, I wonder if...

For Future PRs: - Do some profiling and check whether there is any bubble caused by synchronization between CPU & GPU - Support speculative-num-steps > 1 - Support topk >...

Hi @isky-cd , #3424 seems to be fixed by PR #3709. Could you please pull the latest branch and see whether this bug can be solved?

Please try to uninstall flashinfer and use `pip install -e "python[all]"` to download again.

@MichoChan Hi, we updated Lora with a triton backend recently in #3161. It should improve the serving performance by setting `--lora-backend` argument to `triton`. In the future we will support...