vllm [Performance]: the performance with chunked-prefill-enabled is lower than default

I tested vllm benchmark_throughput.py and finded that the performance with chunked-prefill-enabled is lower than default, how can I deal this problem

No response

Your current environment (if you think it is necessary)

export CUDA_VISIBLE_DEVICES=0
python3 ./benchmarks/benchmark_throughput.py \
    --model /home/workspace/chatglm3-6b/ \
    --tokenizer /home/workspace/chatglm3-6b/ \
    --num-prompts 16 \
    --input-len 1024 \
    --output-len 256 \
    --enable-chunked-prefill \
    --trust-remote-code

Does the params set ok?

Jul 05 '24 05:07 BestKuan

chunked_prefill_enable = False

INFO 09-01 12:46:11 async_llm_engine.py:268] 7cbe74f5c90c4a95954ae8b87d36a3c6 finished E2E: 0.29664182662963867, TTFT: 0.29621362686157227, TBT: 0.00042819976806640625, TIQ: 0.001392364501953125 INFO 09-01 12:46:15 async_llm_engine.py:268] 9bbc02b5dc904963a915612fc8951d0a finished E2E: 0.29630255699157715, TTFT: 0.2959132194519043, TBT: 0.00038933753967285156, TIQ: 0.0011632442474365234

chunked_prefill_enable = True INFO 09-01 12:52:55 async_llm_engine.py:268] f4ce2ce1237146b79df1e698d6d70582 finished E2E: 0.3303070068359375, TTFT: 0.32995128631591797, TBT: 0.00035572052001953125, TIQ: 0.0012929439544677734 INFO 09-01 12:53:00 async_llm_engine.py:268] b03a99b525da4bfd8ef6ef1928030a6b finished E2E: 0.3486812114715576, TTFT: 0.3483591079711914, TBT: 0.00032210350036621094, TIQ: 0.0012426376342773438

when enable the chunked prefill, TTFT 296ms -> 330ms

Sep 01 '24 12:09 pipul

me too!

Oct 17 '24 12:10 zhaotyer

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Jan 16 '25 01:01 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Feb 15 '25 01:02 github-actions[bot]