llm-inference-benchmark icon indicating copy to clipboard operation
llm-inference-benchmark copied to clipboard

Why is the inference FTL@1 longer after the vllm framework is quantized?

Open luhairong11 opened this issue 10 months ago • 2 comments

image image

luhairong11 avatar Apr 02 '24 01:04 luhairong11

vLLM has already fixed this issue.

I will retest soon.

ninehills avatar Apr 02 '24 04:04 ninehills

@ninehills Is there any update on this? Or could you tell me in which version of vLLM this issue was resolved?

In version vLLM-0.4.3, my tests show that the quantized version's TTFT is still lower than the non-quantized version

cyc00518 avatar Jul 07 '24 04:07 cyc00518