why i use vllm inference deepseek v2 ,speed is low

Open ZzzybEric opened this issue 1 year ago • 2 comments

i use vllm to inference deepspeed, use flask to deploy model. When the problem enters the model, it always gets stuck for a long time in the processd prompt step，the code i use is your example code

May 12 '24 12:05 ZzzybEric

https://huggingface.co/deepseek-ai/DeepSeek-V2/discussions/1 @ZzzybEric

May 27 '24 11:05 luofuli

whats your gpu type？

Jun 20 '24 09:06 ran130683