DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

0.6 req /s is kinda low ,for real?

Open chuangzhidan opened this issue 7 months ago • 2 comments

we have one A100 can support 2 requests ,throughoutput about 10 tokens /s,with just kv cache technique. your configuration with 4 *100 can achieve only 0.6 req /s under vllm seems way too low . find it hard to believe.

chuangzhidan avatar Nov 22 '23 03:11 chuangzhidan

we have one A100 can support 2 requests ,throughoutput about 10 tokens /s,with just kv cache technique. your configuration with 4 *100 can achieve only 0.6 req /s under vllm seems way too low . find it hard to believe.

can you give more details about the model's architecture, size, and the way you used to benchmark it? and some more details about the environment

PawanOsman avatar Nov 22 '23 06:11 PawanOsman

@chuangzhidan please try with the latest main branch. I have made improvements that allow us to match performance the RESTful API to our Python API (see #328).

mrwyattii avatar Nov 28 '23 00:11 mrwyattii