vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC

Open afeldman-nm opened this issue 9 months ago • 6 comments

Fix the issue wherein APC must be disabled on a v1 engine instance in order to support requests with prompt logprobs enabled. Solve this issue through "near-full-recomputation" of prompt logprobs. See the discussion in #13414 for why full recomputation is desirable. A prompt logprobs request will not be able to hit the prefix cache. However, prompt logprobs requests can participate with other requests, allowing some savings on attention; thus, "near-full" recomputation.

Note that a prompt logprobs request cannot hit the prefix cache, it will cache its prompt KVs for use by other, non-prompt-logprobs requests. This works because vLLM v1 APC accommodates redundant cached blocks.

When an engine instance with APC enabled receives a request with prompt logprobs enabled, then scheduler will bypass APC cache hit check. It is critical that when a prompt logprobs request is moved from WAITING to RUNNING, num_computed_tokens=0 for that request.

FIX #13414

afeldman-nm avatar Feb 27 '25 06:02 afeldman-nm

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Feb 27 '25 06:02 github-actions[bot]

Just confirm: Is this ready for review?

comaniac avatar Mar 03 '25 17:03 comaniac

Just confirm: Is this ready for review?

Hi Cody, I am trying to have this ready for review by EOD. Ideally I want to add additional unit test(s).

afeldman-nm avatar Mar 03 '25 17:03 afeldman-nm

sg. Please let me know when it's ready for review and I'll prioritize it.

comaniac avatar Mar 03 '25 17:03 comaniac

@afeldman-nm I think this test needs to be updated too? https://github.com/vllm-project/vllm/blob/main/tests/v1/engine/test_async_llm.py#L78

njhill avatar Mar 05 '25 22:03 njhill

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Mar 06 '25 18:03 mergify[bot]

@afeldman-nm I think this test needs to be updated too? https://github.com/vllm-project/vllm/blob/main/tests/v1/engine/test_async_llm.py#L78

Yes absolutely.

afeldman-nm avatar Mar 07 '25 22:03 afeldman-nm

@comaniac FYI the unit tests are in so we can move forward

afeldman-nm avatar Mar 07 '25 23:03 afeldman-nm

Can you update the V1 Guide with regards to this?

DarkLight1337 avatar Jun 20 '25 04:06 DarkLight1337