vllm
vllm copied to clipboard
[BugFix] Make DP work with connector-delayed new requests
The current DP engine logic assumes that the step() function will always perform a forward pass when there are any running and/or waiting requests. With the new async connector changes this might no longer be the case, specifically when there's only requests in WAITING_FOR_REMOTE_KV state.
These changes adjust the step() function to return a bool indicating whether a forward pass ran, and this is used to determine whether a dummy batch needs to run.
The set_forward_context function is also updated to avoid performing an all-reduce of DP metadata when it's not used in the context of a forward-pass. This is a minimally-invasive fix, but we should probably adjust the connector API to not use the forward context.
Thanks to @wseaton for related testing/experimentation and helping to figure out what changes were needed for this.
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
can we add the bool to EngineCoreOutputs?
@simon-mo I guess the main reason for not doing that is that the api-server-scaleout PR will change this step() to return a dict of client index -> EngineCoreOutputs, and the bool flag indicating whether a forward pass happened is kind of separate from this.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @njhill.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork