GLM-130B icon indicating copy to clipboard operation
GLM-130B copied to clipboard

Infer time increases dramatically when start two server

Open shaochangxu opened this issue 2 years ago • 2 comments

According to the step, i run ../examples/pytorch/glm/glm_server.sh on A100 * 8 and i get 2s with one sentence.
image

But when i start two server A and B, infer time increases to 18s with the same input after i request A server then request B server: image

Is there any other settings i miss? Look forward to your reply!

shaochangxu avatar Mar 03 '23 10:03 shaochangxu

你运行的是fp16的模型还是int4的?

paperHZ avatar Mar 24 '23 06:03 paperHZ

fp16的

shaochangxu avatar Apr 28 '23 06:04 shaochangxu