DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

Benchmark:Performance is lower than vllm

Open zhaotyer opened this issue 1 year ago • 1 comments

Test environment 1A10080G | vllm==0.2.6+cu118 | deepspeed-mii==0.2.0 | Llama-2-7b-chat-hf script:https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii

Test Result: 微信图片_20240130141631

Why is the performance lower than vllm?

zhaotyer avatar Jan 30 '24 07:01 zhaotyer

Hi @zhaotyer could you provide some additional information about how you collected these numbers? Are you running the benchmark in our DeepSpeedExamples repo?

If so, are you gathering these numbers directly from the resulting log files?

I just ran the Llama-2-7b model on 1xA6000 GPU with prompt size 256 and generation size 256 for 1, 2, 4, 8, 16, and 32 clients and I'm seeing roughly equal performance for vLLM and FastGen (DeepSpeed-MII): image

This is expected for the current release. FastGen is capable of providing better performance with longer prompts and shorter generation lengths. We go into greater detail of the performance and benchmarks in the two FastGen release blogs here and here.

mrwyattii avatar Feb 02 '24 01:02 mrwyattii