xunmenglt comments

Results 4 comments of


                                            xunmenglt

How to use multiple Ascend NPUs?

请问问题解决了吗，我现在只能多卡加载模型，但是还是只能单卡推理

npu-910-glm4 Generated Answer Generates Other Languages or Strings

你部署接口的时候指定了模板名称吗，需要指定模板名称

npu-910-glm4 Generated Answer Generates Other Languages or Strings

python3 -m fastchat.serve.model_worker --host 0.0.0.0 --port 21001 --worker-address http://0.0.0.0:21001/ --controller-address http://0.0.0.0:20001/ --model-names "glm-4-9b-chat-1m" --model-path /home/LLM/glm-4-9b-chat-1m --device npu --conv-template chatglm3 你在最后加上 --conv-template chatglm3 这个试试，我记得glm4的对话模板和chatglm3的模板一样如果还是不行的话可以更改fastchat/conversation.py文件，模仿下面这个代码创建一个对话模板 ![image](https://github.com/user-attachments/assets/c5b09183-0ab4-4012-92f2-215f51076abe)

ascend NPU how to Multiple NPUs

你解决了这个问题吗