inference icon indicating copy to clipboard operation
inference copied to clipboard

Qwen1.5-chat 72B int4 4卡(V100) 推理过程中token数到10k会报错OOM

Open EthanD4869 opened this issue 1 year ago • 1 comments

image image

EthanD4869 avatar Apr 30 '24 10:04 EthanD4869

请教一下,速度多少token/s,我在部署32k的int4,awq和gptq都不到1t/s..很困惑

Channingss avatar Apr 30 '24 16:04 Channingss

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Aug 06 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

github-actions[bot] avatar Aug 12 '24 03:08 github-actions[bot]