10jin-yidiandian
Results
1
comments of
10jin-yidiandian
> > 对于因张量并行度过高导致的错误,可以设置 --enable-expert-parallel 参数。 参考:[#17327](https://github.com/vllm-project/vllm/issues/17327) > > Deploying a model with such settings might reduce the inference efficienc Is there any reference