vllm
vllm copied to clipboard
[V1][CUDA Graph] Fix attention metadata tensor sizes for padded batches
Fixes #23789. Move CUDA graph padding before attention metadata construction to ensure tensor sizes match execution environment. Fixes crashes in FlashMLA and other attention backends caused by metadata/execution size mismatches.
- Pass padded token count to _prepare_inputs()
- Update slot_mapping and num_actual_tokens to use padded sizes
- Eliminates need for backend-specific workarounds
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Thanks for the review @LucasWilkinson. I will work on the suggested changes. Do you have any idea on why am I getting CI failures on this consistently?
Do you have any idea on why am I getting CI failures on this consistently?
The CI can be flaky; I would just keep merging/rebasing-off main at a pretty regularly; tests are usually fixed on main in pretty short order
@LucasWilkinson I have accomodated your suggestions in the new changes. Can you please take a look, Thanks!
Hi @LucasWilkinson, I addressed your comments, PTAL thanks!
Move CUDA graph padding before attention metadata construction to ensure tensor sizes match execution environment. Fixes crashes in FlashMLA and other attention backends caused by metadata/execution size mismatches.
I guess this is the problem only for full cudagraph. Am I right?
Found another potential issue: https://github.com/vllm-project/vllm/pull/24002/files#diff-80ee7e2a62f9dcfbb8a312dc4e3948557e97ef187290daebbcae1e28596bda29R746-R753:~:text=self.seq_lens,().item()
self.seq_lens.np[num_reqs:].fill(0)
should be
self.seq_lens.np[num_actual_reqs:].fill(0)
since self.input_batch.num_computed_tokens_cpu[:num_reqs] may have padded zone uncleared.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
@ayushsatyam146 any progress on this?
Hey @LucasWilkinson I got sick earlier this week and was unable to work on this but I will soon do the required changes and push them, Thanks for the patience!
@LucasWilkinson, I have done all the necessary changes that we agreed upon after discussion in the above threads, but I am unable to make the CI pipelines pass for these, and I think these are not flaky failures. I would really appreciate some help or maybe even some hints regarding solving these failures. Thanks!
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
⚠️ The sha of the head commit of this PR conflicts with #25282. Mergify cannot evaluate rules on this PR. ⚠️
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Documentation preview: https://vllm--24002.org.readthedocs.build/en/24002/
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ayushsatyam146.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork