Wuhan Zhang
Wuhan Zhang
我也是,我感觉不是7b-base慢了,Qwen1.5-7B-chat变快了
我是用fastchat的
I found that when I launch a Weights & Biases (wandb) service with simulated data alone, there are no issues with the service communication. However, when I simultaneously load a...
> i modified the sglang's code, and it worked for me add this in sglang>srt>server.py line 143 > > ```python > try: > mp.set_start_method('spawn', force=True) > print("spawned") > except RuntimeError:...
I have encountered the same problem.
> > I have encountered the same problem. > > You can directly using vllm for inference, I find it compatibale with Mistral-Large-2 use api?
> @liuanping @shangh1 @fuegoio all my package version are listed above, as for vllm, that is vllm==0.5.2 inference code is quite simple , I'm using 4*H100 for mistral-large-2 > >...
Thank you for your reply. I've noticed that there seems to be a blocking issue with this project—when a non-streaming request is made, other streaming requests get disconnected.
I have meet the same issue. From my point of view, it happens in two scenes: One situation is under heavy request pressure (like a graphrag), the other is situation...
@RayneSun Which tool parser is used for deploying Qwen with vLLM?