Baizhou Zhang
Baizhou Zhang
It's recommended to compare the performance between enabling and disabling MTP with the following script: ```bash python3 python/sglang/test/send_one.py ``` Please paste the results
@quinnrong94 The added tests cannot pass CI, please have a look
Hi @quinnrong94 , can you take a look at this CI fail? https://github.com/sgl-project/sglang/actions/runs/14996032913/job/42130798605?pr=6109
> > Hi @quinnrong94 , can you take a look at this CI fail? https://github.com/sgl-project/sglang/actions/runs/14996032913/job/42130798605?pr=6109 > > Hi @Fridge003 , I saw flashMLA test failed in CI, I wonder if...
For Future PRs: - Do some profiling and check whether there is any bubble caused by synchronization between CPU & GPU - Support speculative-num-steps > 1 - Support topk >...
Hi @isky-cd , #3424 seems to be fixed by PR #3709. Could you please pull the latest branch and see whether this bug can be solved?
Please try to uninstall flashinfer and use `pip install -e "python[all]"` to download again.
@MichoChan Hi, we updated Lora with a triton backend recently in #3161. It should improve the serving performance by setting `--lora-backend` argument to `triton`. In the future we will support...
cc @zhyncs Could you please have a look.