ganjiuwanshiya.

Results 7 comments of ganjiuwanshiya.

i met the some case.

model size: 6B gpu mem using: 24g gpu type: A100 40G latency one request: 3s When I use deepspeed for single-card inference, the qps does not exceed 2, and the...

Your method solved my problem, thanks.

遇到了同样的问题,麻烦官方看看吧