llm-inference-benchmark Why is the inference FTL@1 longer after the vllm framework is quantized?

Why is the inference FTL@1 longer after the vllm framework is quantized?

Open luhairong11 opened this issue 10 months ago • 2 comments

Apr 02 '24 01:04 luhairong11

vLLM has already fixed this issue.

I will retest soon.

Apr 02 '24 04:04 ninehills

@ninehills Is there any update on this? Or could you tell me in which version of vLLM this issue was resolved?

In version vLLM-0.4.3, my tests show that the quantized version's TTFT is still lower than the non-quantized version

Jul 07 '24 04:07 cyc00518