inference
inference copied to clipboard
Qwen1.5-chat 72B int4 4卡(V100) 推理过程中token数到10k会报错OOM
请教一下,速度多少token/s,我在部署32k的int4,awq和gptq都不到1t/s..很困惑
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.