DeepSpeed-MII Benchmark:Performance is lower than vllm

Benchmark:Performance is lower than vllm

Open zhaotyer opened this issue 1 year ago • 1 comments

Test environment 1A10080G | vllm==0.2.6+cu118 | deepspeed-mii==0.2.0 | Llama-2-7b-chat-hf script:https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii

Test Result: 微信图片_20240130141631

Why is the performance lower than vllm?

Jan 30 '24 07:01 zhaotyer

Hi @zhaotyer could you provide some additional information about how you collected these numbers? Are you running the benchmark in our DeepSpeedExamples repo?

If so, are you gathering these numbers directly from the resulting log files?

I just ran the Llama-2-7b model on 1xA6000 GPU with prompt size 256 and generation size 256 for 1, 2, 4, 8, 16, and 32 clients and I'm seeing roughly equal performance for vLLM and FastGen (DeepSpeed-MII):

This is expected for the current release. FastGen is capable of providing better performance with longer prompts and shorter generation lengths. We go into greater detail of the performance and benchmarks in the two FastGen release blogs here and here.

Feb 02 '24 01:02 mrwyattii

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Benchmark:Performance is lower than vllm

DeepSpeed-MII
DeepSpeed-MII copied to clipboard