vllm
vllm copied to clipboard
[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0
Summary: In torch 2.6.0, torch accidentally changed the default for custom operators to be "requires_contiguous". As a workaround, vLLM added needs_fixed_stride_order to a large number of custom operators.
vLLM is currently on torch 2.7.0 which has reverted the default for custom operators back to needs_fixed_stride_order. This PR cleans up the kernel logic by flipping the default back.
The other reason why I want to flip the default back is that needs_fixed_stride_order is actually buggy and torch 2.8.0 has better behavior for custom operators with no layout tags set.
Also Kaichao tells me that some backends may not have moved to PyTorch 2.7.0 yet (https://github.com/vllm-project/vllm/issues/8932) so I didn't delete the code in this PR.
Test Plan:
- Existing tests
- Ran
pytest tests/compile/test_full_graph.py(this was the test that originally caused us to add the needs_fixed_stride_order tag, see #12721 for context)
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
@houseroad as far as I can tell those tests are failing for me locally (on the base revision before my commit). Should I try to rebase this PR?
Some data points:
- It looks like the spec decoding tests are failing on main too, it's just that they don't seem to run on all PRs:
- buildkite/ci/pr/lora-test-1 I ran this test locally and it passed..
speculative decoding seems broken on trunk. rerun lora tests now.
Let's try rebase and see if the tests can pass?
Rebased.
I think what's going on with the lora tests is that they require the GPU to be empty. Some previously run test is holding onto GPU memory. @khluu tells me we've seen similar things before in vLLM CI, so I think the problem is pre-existing.
If it doesn't work, we probably need a force merge from Simon or Woosuk. :-)
Lora tests passed, other tests failed 😅
Retrying the CI now.
Could you rebase again to the main, some tests should be fixed on main already.
@zou3519 want to retry? CI seems more stable rn
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @zou3519.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Oh man this is finally green