Qwen2.5 [BUG] <title> 采用fastchat+vllm推理在运行一段时间以后请求没有返回

[BUG] <title> 采用fastchat+vllm推理在运行一段时间以后请求没有返回

Open keenouter opened this issue 10 months ago • 2 comments

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

运行一段时间以后，模型会没有返回

期望行为 | Expected Behavior

服务能够不间断运行

复现方法 | Steps To Reproduce

按照fastchat的部署方式 ` python -m fastchat.serve.controller

python -m fastchat.serve.vllm_worker --model-path .cache/modelscope/hub/qwen/Qwen1.5-72B-Chat/ --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.98 --dtype bfloat16 --model-names qwen-1.5_nat_agi_72b_1.1 --limit-worker-concurrency 20

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port

` 运行一段时间(根据物理机而异)出现模型没有返回的情况，请求会直接卡住

运行环境 | Environment

- OS: ubuntu 20.04
- Python: 3.10.9
- Transformers: 4.37.2
- PyTorch: 2.1.2
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.1
- vllm: 0.3.0

配置 8卡A100 （40g）

备注 | Anything else?

No response

Apr 09 '24 01:04 keenouter

fastchat能支持qwen1.5吗

Apr 12 '24 02:04 256785

支持

Apr 15 '24 01:04 keenouter

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

Jun 21 '24 08:06 github-actions[bot]

Qwen2.5 Qwen2.5 copied to clipboard

[BUG] <title> 采用fastchat+vllm推理在运行一段时间以后请求没有返回

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Qwen2.5
Qwen2.5 copied to clipboard