DeepSpeed-MII
DeepSpeed-MII copied to clipboard
Performance with vllm
Hi, I tested for mii and vllm on A100 device for Yi-6B model, it seems that vllm (5.12s/query) is faster than mii (6.08s/query), is there any config that i need to set?
Here is my setting
- input len = 1536
- output len = 512
- batch size = 1
- test set size: 100
- warmup stage is not considered into the time cost statistics.
The model loader is as follows.
model_path = "/mnt/bn/multimodel/models/official/Yi-6B-Chat/"
pipe = mii.pipeline(model_path, torch_dist_port=12345)
resp = pipe([prompt], min_new_tokens=512, max_new_tokens=512)
@littletomatodonkey - mii.pipeline is just for a quick start so performance may not be optimal.
For better performance, please try the mii.serve API to create a persistent deployment.