vllm Update PyTorch to 2.7.0

Notable changes:

PyTorch 2.7.0 has dropped CUDA 12.4, so the remaining options are 12.6 and 12.8
- CUDA 12.6 has build issue https://github.com/vllm-project/vllm/issues/15435#issuecomment-2775924628, so only 12.8 remains
We need a new xformers, flashinfer, and mamba-ssm packages, so let build them from source for now. They can be installed from pypi once they are built upstream with 2.7.0
Leave XPU for later for Intel folks to pick it up as it requires a newer version of intel-extension-for-pytorch

Apr 18 '25 16:04 huydhn

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Apr 18 '25 16:04 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @huydhn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Apr 18 '25 16:04 mergify[bot]

Now that torch 2.7 has released (https://pypi.org/project/torch/2.7.0/) can this be updated?

Apr 23 '25 17:04 mgoin

Now that torch 2.7 has released (https://pypi.org/project/torch/2.7.0/) can this be updated?

Yup, it can be updated now. Let me start working on that. On the other hand, I think I will keep the state of this PR as a reference because we plan to do similar validation for the next PyTorch release. Ideally, the validation needs to be done with PyTorch RC before publishing to pypi.

Apr 23 '25 17:04 huydhn

@youkaichao @zou3519 I think some of the failed tests are pointing to a real torch.compile compatibility issue with 2.7.0 https://buildkite.com/vllm/ci/builds/18733#01967697-2cf2-469f-91a7-63e66f589d77/205-1262

I can reproduce it with pytest -v tests/models/test_transformers.py -k test_models[meta-llama/Llama-3.2-1B-Instruct-transformers]

https://paste.sh/qgshOYtT#6dx7jO1lRsFmoSN43N6aAXeE

Any thoughts?

Apr 27 '25 17:04 huydhn

vLLM monkeypatches Dynamo to provide a list of file paths that it traced over. Somehow the monkeypatched function provided the filename "<string>". I'll try to run this and see, my best guess right now is that we need to filter out things that are not filepaths

Apr 28 '25 15:04 zou3519

After I worked around it (by filtering out "<string>") there are more issues. Currently getting:

torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['input_ids'].size()[0],
 L['positions'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of RelaxedUnspecConstraint(L['input_ids'].size()[0]) are valid because L['input_ids'].size()
[0] was inferred to be a constant (16384).
  - Not all values of RelaxedUnspecConstraint(L['positions'].size()[0]) are valid because L['positions'].size()
[0] was inferred to be a constant (16384).

Apr 28 '25 16:04 zou3519

Sync with @zou3519 offline:

There are at least 2 problems as mentioned in previous comments. @zou3519 is working on a fix, so the plan is to have that landed, then I can rebase this change
Maybe the recent transformers bump is related https://github.com/vllm-project/vllm/pull/17116 as the failure seems to come from there

Apr 28 '25 17:04 huydhn

@zou3519 Thank you for the fix! The failures with torch.compile has been resolved.

@simon-mo I think this is ready to land. The remaining failures on CI are:

2 failed speculative decode tests, which I think are existing failures from trunk and are not related to this update. I attempt to fix them separately at https://github.com/vllm-project/vllm/pull/17371.
On the other hand, the python only installation test failure is legit, but I think it can only be fixed by updating the nightly binaries. I have a PR lining up to do that after this land https://github.com/vllm-project/vllm/pull/17224 (need a rebase). Let me know if that's the regular procedure.

If they are ok, please help land this PR and we can finally have vLLM on 2.7.0 :)

Apr 29 '25 09:04 huydhn

Btw, did you need https://github.com/vllm-project/vllm/pull/17338 or was that just because I'm on torch > 2.7?

Apr 29 '25 12:04 zou3519

New release of xformers with torch 2.7 support has come out https://github.com/facebookresearch/xformers/releases/tag/v0.0.30

Apr 29 '25 15:04 mgoin

Btw, did you need #17338 or was that just because I'm on torch > 2.7?

I don't see any failures due to #17338 on CI, so I guess it's only applicable to PyTorch nightly, not 2.7.0 release. On the other hand, if you can land your PR before this PR is merged. I can do a rebase just to be sure.

Apr 29 '25 16:04 huydhn

Btw, did you need #17338 or was that just because I'm on torch > 2.7?

I don't see any failures due to #17338 on CI, so I guess it's only applicable to PyTorch nightly, not 2.7.0 release. On the other hand, if you can land your PR before this PR is merged. I can do a rebase just to be sure.

No need to wait for my PR if the signals on this PR are good

Apr 29 '25 17:04 zou3519

When to sync to pip repository

Apr 30 '25 03:04 DogeFlow

Does vllm publish nightlies to some pip channel? Asking to try out vllm with pytorch 2.7.0

May 04 '25 19:05 vadimkantorov

@vadimkantorov

pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

May 04 '25 19:05 mgoin

Couldn't find this nightly instruction in the installation section of README. Might be good to add it there too!

May 05 '25 06:05 vadimkantorov

https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#pre-built-wheels

May 08 '25 17:05 simon-mo

pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

For me, this again tries to fetch 2.6.0 :( despite having 2.7.0 installed:

vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl does not look like nightly wheel :( looks like it's the release wheel. I think pip discovers this wheel from my local cache because it exists and does not attempt to install the nightly :( Bug in pip? I manually went to https://wheels.vllm.ai/nightly/vllm and found this wheel https://wheels.vllm.ai/vllm-0.8.5.dev599%2Bg9fbf2bfbd-cp38-abi3-manylinux1_x86_64.whl - which indeed looks like nightly, but it gives 404 :(

$ pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
                                                                                                          
Defaulting to user installation because normal site-packages is not writeable                                              
Looking in indexes: https://pypi.org/simple, https://wheels.vllm.ai/nightly                                                
Collecting vllm                                                                                                              
Downloading vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl (326.4 MB)                                                        
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 326.4/326.4 MB 10.2 MB/s eta 0:00:00
...
Collecting torch==2.6.0
  Downloading torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl (766.7 MB)
     ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 221.3/766.7 MB 252.7 MB/s eta 0:00:03                                        
ERROR: Operation cancelled by user

# second try:
$ pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly/vllm                                                                                                          
Defaulting to user installation because normal site-packages is not writeable                                              
Looking in indexes: https://pypi.org/simple, https://wheels.vllm.ai/nightly/vllm                                           
Collecting vllm
  Using cached vllm-0.8.5.post1-cp38-abi3-manylinux1_x86_64.whl (326.4 MB)

May 12 '25 10:05 vadimkantorov

Basically, can't find direct URLs to nightly wheels :( which might be needed to circumvent pip not wanting to install the nightly for some reason

So far managed to find published commit from:

pip index versions vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

# and then

pip install -U vllm==0.8.5.dev600+g7ea6cb28b --pre --extra-index-url https://wheels.vllm.ai/nightly

May 12 '25 11:05 vadimkantorov

Hi，why upgrade to pytorch2.7?

Jun 10 '25 03:06 zhanglianjie-163

I found that after this commit, the tpot/itl of the Qwen/Qwen2.5-14B-Instruct model on an h20 dropped from 20ms to 10ms. I want to know which part of the code benefited from this. Is it pytorch2.7?

Jun 25 '25 03:06 jessiewiswjc

vllm vllm copied to clipboard

Update PyTorch to 2.7.0

vllm
vllm copied to clipboard