vllm
vllm copied to clipboard
[CI/Build] Add support for Python 3.13
FIX https://github.com/vllm-project/vllm/issues/12083
Dependencies that are blockers for python 3.13 support:
- [ ] ray (it seems blocking until 2.45 comes out https://github.com/ray-project/ray/issues/49738#issuecomment-2755842804)
- [x] xgrammar (issue https://github.com/mlc-ai/xgrammar/issues/193)
- [x] torchaudio==2.5.1 (resolved by https://github.com/vllm-project/vllm/pull/12721)
- [x] vllm-flash-attn (https://github.com/vllm-project/flash-attention/pull/47)
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
I think you'd also need a matching PR on https://github.com/vllm-project/flash-attention/blob/main/CMakeLists.txt#L22
Run locally with TORCH_CUDA_ARCH_LIST="Auto" VLLM_BUILD_DIR=build pip install -v --no-clean --no-build-isolation -e . and you should run into the error if you didn't patch this part.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
https://github.com/ray-project/ray/issues/49738
ray already has 3.13 nightly builds, stable wheel coming with next release probably
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Can you please update the blocker list? Sound like xgrammar is not longer an issue since we're on 0.1.15 now: https://github.com/vllm-project/vllm/pull/14563 Also python 3.13 is now supported in flash attention as well!
bumped into this error when i pip install vllm in a newly created environment.
python version 3.13.0
Collecting numba==0.60.0 (from vllm)
Using cached numba-0.60.0.tar.gz (2.7 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [24 lines of output]
Traceback (most recent call last):
File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
~~~~^^
File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
self.run_setup()
~~~~~~~~~~~~~~^^
File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 320, in run_setup
exec(code, locals())
~~~~^^^^^^^^^^^^^^^^
File "<string>", line 51, in <module>
File "<string>", line 48, in _guard_py_ver
RuntimeError: Cannot install on Python version 3.13.2; only versions >=3.9,<3.13 are supported.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Updated the list and it seems like ray is still going to be a blocker until 2.45 comes out https://github.com/ray-project/ray/issues/49738#issuecomment-2755842804
Also numba needs to be 0.62 to work with python 3.13
@mgoin ray 2.45 is out :)
Thanks for the heads up. Xformers is still a blocker
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
I noticed that everything in the dependency list at the top is completed now. i installed xformers dev build so i figured i would try vllm.
i'm using ubuntu 25.04 with python 3.13.3, cuda 12.9 and torch dev build 2.8.0.dev20250626+cu129.
On main branch, i ran use_existing-torch, edited files to allow python 3.13 and did a python3 -m build --no-isolation. it built fine but when i install and run it I get:
$ vllm serve Qwen/Qwen2.5-1.5B-Instruct
INFO 06-27 21:54:04 [__init__.py:244] Automatically detected platform cuda.
Traceback (most recent call last):
File "/usr/lib/python3.13/inspect.py", line 1087, in findsource
lnum = vars(object)['__firstlineno__'] - 1
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: '__firstlineno__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jason/work/pytorch/try2/bin/vllm", line 5, in <module>
from vllm.entrypoints.cli.main import main
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/__init__.py", line 3, in <module>
from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in <module>
from vllm.benchmarks.latency import add_cli_args, main
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/benchmarks/latency.py", line 16, in <module>
from vllm import LLM, SamplingParams
File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/__init__.py", line 64, in __getattr__
module = import_module(module_name, __package__)
File "/usr/lib/python3.13/importlib/__init__.py", line 88, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 20, in <module>
from vllm.config import (CompilationConfig, ModelDType, TokenizerMode,
is_init_field)
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 246, in <module>
@config
^^^^^^
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 199, in config
attr_docs = get_attr_docs(cls)
File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 154, in get_attr_docs
cls_node = ast.parse(textwrap.dedent(inspect.getsource(cls))).body[0]
~~~~~~~~~~~~~~~~~^^^^^
File "/usr/lib/python3.13/inspect.py", line 1258, in getsource
lines, lnum = getsourcelines(object)
~~~~~~~~~~~~~~^^^^^^^^
File "/usr/lib/python3.13/inspect.py", line 1240, in getsourcelines
lines, lnum = findsource(object)
~~~~~~~~~~^^^^^^^^
File "/usr/lib/python3.13/inspect.py", line 1089, in findsource
raise OSError('source code not available')
OSError: source code not available
it looks like something about fetching docstrings? is there an easy fix?
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
I noticed that everything in the dependency list at the top is completed now. i installed xformers dev build so i figured i would try vllm.
i'm using ubuntu 25.04 with python 3.13.3, cuda 12.9 and torch dev build 2.8.0.dev20250626+cu129.
On main branch, i ran use_existing-torch, edited files to allow python 3.13 and did a python3 -m build --no-isolation. it built fine but when i install and run it I get:
$ vllm serve Qwen/Qwen2.5-1.5B-Instruct INFO 06-27 21:54:04 [__init__.py:244] Automatically detected platform cuda. Traceback (most recent call last): File "/usr/lib/python3.13/inspect.py", line 1087, in findsource lnum = vars(object)['__firstlineno__'] - 1 ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^ KeyError: '__firstlineno__'
Full stacktrace
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/jason/work/pytorch/try2/bin/vllm", line 5, in
from vllm.entrypoints.cli.main import main File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/init.py", line 3, in from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in from vllm.benchmarks.latency import add_cli_args, main File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/benchmarks/latency.py", line 16, in from vllm import LLM, SamplingParams File " ", line 1412, in _handle_fromlist File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/init.py", line 64, in getattr module = import_module(module_name, package) File "/usr/lib/python3.13/importlib/init.py", line 88, in import_module return _bootstrap._gcd_import(name[level:], package, level) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 20, in from vllm.config import (CompilationConfig, ModelDType, TokenizerMode, is_init_field) File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 246, in @config ^^^^^^ File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 199, in config attr_docs = get_attr_docs(cls) File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 154, in get_attr_docs cls_node = ast.parse(textwrap.dedent(inspect.getsource(cls))).body[0] ~~~~~~~~~~~~~~~~~^^^^^ File "/usr/lib/python3.13/inspect.py", line 1258, in getsource lines, lnum = getsourcelines(object) ~~~~~~~~~~~~~~^^^^^^^^ File "/usr/lib/python3.13/inspect.py", line 1240, in getsourcelines lines, lnum = findsource(object) ~~~~~~~~~~^^^^^^^^ File "/usr/lib/python3.13/inspect.py", line 1089, in findsource raise OSError('source code not available') OSError: source code not available it looks like something about fetching docstrings? is there an easy fix?
The direct cause:
from pydantic.dataclasses import dataclass
@dataclass(config=ConfigDict(arbitrary_types_allowed=True))
class ModelConfig:
In python 3.13, @dataclass from pydantic returns a wrapped class that you can no longer get source of.
AFAIK, my solution will be something like @register_doc before @dataclass breaks it.
I also tried to save "original class" in pydantic implmentation, but in my test, the original class is equal to wrapped class, so inspect library failed on that too. It does not work. Guess we should process documentation asap.
Hi @DKingAlpha thanks for testing, I've resolved this by changing the method used on config.py
@mgoin thx, its working now.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork