10jin-yidiandian comments

Repositories
Issues
Comments

Results 1 comments of


                                            10jin-yidiandian

[Bug]: `size_k must divisible by BLOCK_SIZE_K` error when using tensor parallelism with AWQ-quantized MoE models

> > 对于因张量并行度过高导致的错误，可以设置 --enable-expert-parallel 参数。参考：[#17327](https://github.com/vllm-project/vllm/issues/17327) > > Deploying a model with such settings might reduce the inference efficienc Is there any reference