ganjiuwanshiya.
ganjiuwanshiya.
@howl-anderson Ok, looking forward to your future results, thank you.
H, @howl-anderson Is there any conclusion to this question? thank you.
i met the some case.
model size: 6B gpu mem using: 24g gpu type: A100 40G latency one request: 3s When I use deepspeed for single-card inference, the qps does not exceed 2, and the...
Your method solved my problem, thanks.
遇到了同样的问题,麻烦官方看看吧