swift 将vllm版本从0.3.1升级到0.4.0后，使用swift部署模型，服务请求时间明显变长

将vllm版本从0.3.1升级到0.4.0后，使用swift部署模型，服务请求时间明显变长

Open HIT-Owen opened this issue 2 months ago • 3 comments

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) 将vllm版本从0.3.1升级到0.4.0后，使用swift部署模型，在相同模型、相同prompt的情况下，服务请求时间明显变长（2倍以上），server部署命令参数没有做任何修改

CUDA_VISIBLE_DEVICES=1 swift deploy --model_type qwen1half-7b-chat
--model_cache_dir /data/ssd/LLM_models/qwen/Qwen1.5-7B-Chat
--infer_backend vllm
--use_flash_attn true
--host 0.0.0.0
--port 8000
--max_new_tokens 512
--temperature 0.3
--top_p 0.7
--repetition_penalty 1.0

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) 系统：ubuntu22.04 CUDA Version: 12.3 GPU型号：NVIDIA H100 80GB torch版本：2.1.2 transformers版本：4.39.3 swift版本：2.1.0.dev0

Additional context Add any other context about the problem here(在这里补充其他信息)

Apr 24 '24 06:04 HIT-Owen

我来查一下原因

Apr 24 '24 07:04 Jintao-Huang

我也发现了回退到原来版本了

Apr 29 '24 10:04 zhangfan-algo

这种问题的最终解决办法是解耦。swift训完的模型，怎么转回的基座的样式，让大家想怎么弄怎么弄。

https://github.com/modelscope/swift/issues/838

Apr 29 '24 10:04 eigen2017

swift swift copied to clipboard

将vllm版本从0.3.1升级到0.4.0后，使用swift部署模型，服务请求时间明显变长

swift
swift copied to clipboard