vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch

Open zhouyu5 opened this issue 10 months ago • 10 comments

Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (https://github.com/HabanaAI/vllm-hpu-extension/pull/17) to work.

Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. commit further improves the performance of this feature (9-22%). Feature negatively impacts the performance of Gaudi2.

Use VLLM_CONTIGUOUS_PA=true environment variable to enable.

zhouyu5 avatar Jan 17 '25 03:01 zhouyu5

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

github-actions[bot] avatar Jan 17 '25 03:01 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jan 23 '25 08:01 mergify[bot]

/ready

zhouyu5 avatar Jan 23 '25 09:01 zhouyu5

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Feb 08 '25 07:02 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Feb 13 '25 03:02 mergify[bot]

LGTM. Leave to @youkaichao

since this pr only changes the hpu code, and add a new env var, I think it's fine from my perspective.

youkaichao avatar Feb 18 '25 03:02 youkaichao

CI not giving stable results, will trigger it again.

zhouyu5 avatar Feb 18 '25 08:02 zhouyu5

CI not giving stable results, will trigger it again.

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

khluu avatar Feb 18 '25 08:02 khluu

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: [email protected]

zhouyu5 avatar Feb 18 '25 09:02 zhouyu5

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: [email protected]

I sent an invite

khluu avatar Feb 19 '25 00:02 khluu

All test passed, could you help merge it? @comaniac

zhouyu5 avatar Feb 19 '25 02:02 zhouyu5