vllm [Kernel] Apply torch.Tag.needs_fixed_stride

Summary: In torch 2.6.0, torch accidentally changed the default for custom operators to be "requires_contiguous". As a workaround, vLLM added needs_fixed_stride_order to a large number of custom operators.

vLLM is currently on torch 2.7.0 which has reverted the default for custom operators back to needs_fixed_stride_order. This PR cleans up the kernel logic by flipping the default back.

The other reason why I want to flip the default back is that needs_fixed_stride_order is actually buggy and torch 2.8.0 has better behavior for custom operators with no layout tags set.

Also Kaichao tells me that some backends may not have moved to PyTorch 2.7.0 yet (https://github.com/vllm-project/vllm/issues/8932) so I didn't delete the code in this PR.

Test Plan:

Existing tests
Ran pytest tests/compile/test_full_graph.py (this was the test that originally caused us to add the needs_fixed_stride_order tag, see #12721 for context)

Jun 09 '25 04:06 zou3519

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Jun 09 '25 04:06 github-actions[bot]

@houseroad as far as I can tell those tests are failing for me locally (on the base revision before my commit). Should I try to rebase this PR?

Jun 10 '25 15:06 zou3519

Some data points:

It looks like the spec decoding tests are failing on main too, it's just that they don't seem to run on all PRs:

buildkite/ci/pr/lora-test-1 I ran this test locally and it passed..

Jun 10 '25 18:06 zou3519

speculative decoding seems broken on trunk. rerun lora tests now.

Jun 12 '25 14:06 houseroad

Let's try rebase and see if the tests can pass?

Jun 16 '25 13:06 houseroad

Rebased.

I think what's going on with the lora tests is that they require the GPU to be empty. Some previously run test is holding onto GPU memory. @khluu tells me we've seen similar things before in vLLM CI, so I think the problem is pre-existing.

Jun 16 '25 13:06 zou3519

If it doesn't work, we probably need a force merge from Simon or Woosuk. :-)

Jun 16 '25 13:06 houseroad

Lora tests passed, other tests failed 😅

Jun 17 '25 03:06 zou3519

Retrying the CI now.

Jun 26 '25 12:06 houseroad

Could you rebase again to the main, some tests should be fixed on main already.

Jun 27 '25 13:06 houseroad

@zou3519 want to retry? CI seems more stable rn

Jul 09 '25 18:07 ProExpertProg

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jul 11 '25 03:07 mergify[bot]

Oh man this is finally green

Jul 18 '25 18:07 zou3519

vllm
vllm copied to clipboard

[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0

vllm vllm copied to clipboard

[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0

vllm
vllm copied to clipboard