Qwen2.5
Qwen2.5 copied to clipboard
[BUG] <title> 采用fastchat+vllm推理在运行一段时间以后请求没有返回
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
运行一段时间以后,模型会没有返回
期望行为 | Expected Behavior
服务能够不间断运行
复现方法 | Steps To Reproduce
按照fastchat的部署方式 ` python -m fastchat.serve.controller
python -m fastchat.serve.vllm_worker --model-path .cache/modelscope/hub/qwen/Qwen1.5-72B-Chat/ --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.98 --dtype bfloat16 --model-names qwen-1.5_nat_agi_72b_1.1 --limit-worker-concurrency 20
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port
` 运行一段时间(根据物理机而异)出现模型没有返回的情况,请求会直接卡住
运行环境 | Environment
- OS: ubuntu 20.04
- Python: 3.10.9
- Transformers: 4.37.2
- PyTorch: 2.1.2
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.1
- vllm: 0.3.0
配置 8卡A100 (40g)
备注 | Anything else?
No response
fastchat能支持qwen1.5吗
支持
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.