vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Reduce the size of block_table by getting rid of padding.

Open vanbasten23 opened this issue 8 months ago • 1 comments

Reduce the size of block_table by getting rid of padding.

Test plan:

  1. $ VLLM_USE_V1=1 pytest -s -v vllm/tests/entrypoints/llm/test_accuracy.py::test_lm_eval_accuracy_v1_engine 2>&1 | tee out.txt
VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct --disable-log-requests --port 8003 --gpu-memory-utilization 0.95 --max-num-batched-tokens 8192 --tensor-parallel-size 1 --max-model-len 2048 &
python3 benchmark_serving.py --model meta-llama/Llama-3.1-8B-Instruct     --dataset-name sonnet     --dataset-path sonnet_4x.txt     --num-prompts 1000     --sonnet-input-len 2000     --sonnet-output-len 128     --port 8003

vanbasten23 avatar Mar 08 '25 01:03 vanbasten23

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Mar 08 '25 01:03 github-actions[bot]

NUM_KV_PAGES_PER_BLOCK is no longer used in tpu_model_runner.py after this change. Is that intentional?

Yea, NUM_KV_PAGES_PER_BLOCK is used for padding. Since we don't need to pad anymore, we no longer need it.

vanbasten23 avatar Mar 08 '25 02:03 vanbasten23

Thanks @mgoin for the review!

vanbasten23 avatar Mar 09 '25 21:03 vanbasten23