vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[CI/Build] Add support for Python 3.13

Open mgoin opened this issue 9 months ago • 8 comments

FIX https://github.com/vllm-project/vllm/issues/12083

Dependencies that are blockers for python 3.13 support:

  • [ ] ray (it seems blocking until 2.45 comes out https://github.com/ray-project/ray/issues/49738#issuecomment-2755842804)
  • [x] xgrammar (issue https://github.com/mlc-ai/xgrammar/issues/193)
  • [x] torchaudio==2.5.1 (resolved by https://github.com/vllm-project/vllm/pull/12721)
  • [x] vllm-flash-attn (https://github.com/vllm-project/flash-attention/pull/47)

mgoin avatar Feb 12 '25 16:02 mgoin

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Feb 12 '25 16:02 github-actions[bot]

I think you'd also need a matching PR on https://github.com/vllm-project/flash-attention/blob/main/CMakeLists.txt#L22

Run locally with TORCH_CUDA_ARCH_LIST="Auto" VLLM_BUILD_DIR=build pip install -v --no-clean --no-build-isolation -e . and you should run into the error if you didn't patch this part.

Ryp avatar Feb 12 '25 17:02 Ryp

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Feb 17 '25 15:02 mergify[bot]

https://github.com/ray-project/ray/issues/49738

ray already has 3.13 nightly builds, stable wheel coming with next release probably

manueldeprada avatar Feb 28 '25 14:02 manueldeprada

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Mar 06 '25 01:03 mergify[bot]

Can you please update the blocker list? Sound like xgrammar is not longer an issue since we're on 0.1.15 now: https://github.com/vllm-project/vllm/pull/14563 Also python 3.13 is now supported in flash attention as well!

Ryp avatar Mar 12 '25 14:03 Ryp

bumped into this error when i pip install vllm in a newly created environment.

python version 3.13.0

Collecting numba==0.60.0 (from vllm)
  Using cached numba-0.60.0.tar.gz (2.7 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      Traceback (most recent call last):
        File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
          main()
          ~~~~^^
        File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
          json_out["return_val"] = hook(**hook_input["kwargs"])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/zimin/miniconda3/envs/vllm/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
          ~~~~~~~~~~~~~~^^
        File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-moyn66y8/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
          ~~~~^^^^^^^^^^^^^^^^
        File "<string>", line 51, in <module>
        File "<string>", line 48, in _guard_py_ver
      RuntimeError: Cannot install on Python version 3.13.2; only versions >=3.9,<3.13 are supported.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

21m1n avatar Mar 24 '25 02:03 21m1n

Updated the list and it seems like ray is still going to be a blocker until 2.45 comes out https://github.com/ray-project/ray/issues/49738#issuecomment-2755842804

mgoin avatar Mar 28 '25 13:03 mgoin

Also numba needs to be 0.62 to work with python 3.13

dan-and avatar Apr 23 '25 07:04 dan-and

@mgoin ray 2.45 is out :)

manueldeprada avatar Apr 29 '25 23:04 manueldeprada

Thanks for the heads up. Xformers is still a blocker

mgoin avatar Apr 30 '25 01:04 mgoin

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar May 14 '25 11:05 mergify[bot]

I noticed that everything in the dependency list at the top is completed now. i installed xformers dev build so i figured i would try vllm.

i'm using ubuntu 25.04 with python 3.13.3, cuda 12.9 and torch dev build 2.8.0.dev20250626+cu129.

On main branch, i ran use_existing-torch, edited files to allow python 3.13 and did a python3 -m build --no-isolation. it built fine but when i install and run it I get:

$ vllm serve Qwen/Qwen2.5-1.5B-Instruct
INFO 06-27 21:54:04 [__init__.py:244] Automatically detected platform cuda.
Traceback (most recent call last):
  File "/usr/lib/python3.13/inspect.py", line 1087, in findsource
    lnum = vars(object)['__firstlineno__'] - 1
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: '__firstlineno__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jason/work/pytorch/try2/bin/vllm", line 5, in <module>
    from vllm.entrypoints.cli.main import main
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/__init__.py", line 3, in <module>
    from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in <module>
    from vllm.benchmarks.latency import add_cli_args, main
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/benchmarks/latency.py", line 16, in <module>
    from vllm import LLM, SamplingParams
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/__init__.py", line 64, in __getattr__
    module = import_module(module_name, __package__)
  File "/usr/lib/python3.13/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 20, in <module>
    from vllm.config import (CompilationConfig, ModelDType, TokenizerMode,
                             is_init_field)
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 246, in <module>
    @config
     ^^^^^^
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 199, in config
    attr_docs = get_attr_docs(cls)
  File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 154, in get_attr_docs
    cls_node = ast.parse(textwrap.dedent(inspect.getsource(cls))).body[0]
                                         ~~~~~~~~~~~~~~~~~^^^^^
  File "/usr/lib/python3.13/inspect.py", line 1258, in getsource
    lines, lnum = getsourcelines(object)
                  ~~~~~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/inspect.py", line 1240, in getsourcelines
    lines, lnum = findsource(object)
                  ~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/inspect.py", line 1089, in findsource
    raise OSError('source code not available')
OSError: source code not available

it looks like something about fetching docstrings? is there an easy fix?

jasonbishop avatar Jun 28 '25 04:06 jasonbishop

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Jul 11 '25 03:07 mergify[bot]

I noticed that everything in the dependency list at the top is completed now. i installed xformers dev build so i figured i would try vllm.

i'm using ubuntu 25.04 with python 3.13.3, cuda 12.9 and torch dev build 2.8.0.dev20250626+cu129.

On main branch, i ran use_existing-torch, edited files to allow python 3.13 and did a python3 -m build --no-isolation. it built fine but when i install and run it I get:

$ vllm serve Qwen/Qwen2.5-1.5B-Instruct
INFO 06-27 21:54:04 [__init__.py:244] Automatically detected platform cuda.
Traceback (most recent call last):
  File "/usr/lib/python3.13/inspect.py", line 1087, in findsource
    lnum = vars(object)['__firstlineno__'] - 1
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: '__firstlineno__'

Full stacktrace

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jason/work/pytorch/try2/bin/vllm", line 5, in from vllm.entrypoints.cli.main import main File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/init.py", line 3, in from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in from vllm.benchmarks.latency import add_cli_args, main File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/benchmarks/latency.py", line 16, in from vllm import LLM, SamplingParams File "", line 1412, in _handle_fromlist File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/init.py", line 64, in getattr module = import_module(module_name, package) File "/usr/lib/python3.13/importlib/init.py", line 88, in import_module return _bootstrap._gcd_import(name[level:], package, level) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/entrypoints/llm.py", line 20, in from vllm.config import (CompilationConfig, ModelDType, TokenizerMode, is_init_field) File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 246, in @config ^^^^^^ File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 199, in config attr_docs = get_attr_docs(cls) File "/home/jason/work/pytorch/try2/lib/python3.13/site-packages/vllm/config.py", line 154, in get_attr_docs cls_node = ast.parse(textwrap.dedent(inspect.getsource(cls))).body[0] ~~~~~~~~~~~~~~~~~^^^^^ File "/usr/lib/python3.13/inspect.py", line 1258, in getsource lines, lnum = getsourcelines(object) ~~~~~~~~~~~~~~^^^^^^^^ File "/usr/lib/python3.13/inspect.py", line 1240, in getsourcelines lines, lnum = findsource(object) ~~~~~~~~~~^^^^^^^^ File "/usr/lib/python3.13/inspect.py", line 1089, in findsource raise OSError('source code not available') OSError: source code not available


it looks like something about fetching docstrings? is there an easy fix?

The direct cause:

from pydantic.dataclasses import dataclass

@dataclass(config=ConfigDict(arbitrary_types_allowed=True))
class ModelConfig:

In python 3.13, @dataclass from pydantic returns a wrapped class that you can no longer get source of.

AFAIK, my solution will be something like @register_doc before @dataclass breaks it.

I also tried to save "original class" in pydantic implmentation, but in my test, the original class is equal to wrapped class, so inspect library failed on that too. It does not work. Guess we should process documentation asap.

DKingAlpha avatar Jul 24 '25 07:07 DKingAlpha

Hi @DKingAlpha thanks for testing, I've resolved this by changing the method used on config.py

mgoin avatar Aug 02 '25 17:08 mgoin

@mgoin thx, its working now.

DKingAlpha avatar Aug 05 '25 13:08 DKingAlpha

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Aug 08 '25 23:08 mergify[bot]