llm-inference-benchmark
llm-inference-benchmark copied to clipboard
Why is the inference FTL@1 longer after the vllm framework is quantized?
vLLM has already fixed this issue.
I will retest soon.
@ninehills Is there any update on this? Or could you tell me in which version of vLLM this issue was resolved?
In version vLLM-0.4.3, my tests show that the quantized version's TTFT is still lower than the non-quantized version