vllm
vllm copied to clipboard
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch
Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (https://github.com/HabanaAI/vllm-hpu-extension/pull/17) to work.
Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. commit further improves the performance of this feature (9-22%). Feature negatively impacts the performance of Gaudi2.
Use VLLM_CONTIGUOUS_PA=true environment variable to enable.
👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can do one of these:
- Add
readylabel to the PR - Enable auto-merge.
🚀
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
/ready
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
LGTM. Leave to @youkaichao
since this pr only changes the hpu code, and add a new env var, I think it's fine from my perspective.
CI not giving stable results, will trigger it again.
CI not giving stable results, will trigger it again.
feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.
feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.
Thank you~ @khluu Please add me: [email protected]
feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.
Thank you~ @khluu Please add me: [email protected]
I sent an invite
All test passed, could you help merge it? @comaniac