vllm
vllm copied to clipboard
[Bug]: FastAPI 0.113.0 breaks vLLM OpenAPI
Your current environment
The output of `python collect_env.py`
Collecting environment information...
WARNING 09-05 21:11:49 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead, and make sure to uninstall `pynvml`. When both of them are installed, `pynvml` will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.
WARNING 09-05 21:11:49 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/vllm/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm.commit_id'
from vllm.version import __version__ as VLLM_VERSION
PyTorch version: 2.4.0a0+3bcc3cddb5.nv24.07
Is debug build: False
CUDA used to build PyTorch: 12.5
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.0
Libc version: glibc-2.35
Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-1024-nvidia-64k-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.5.82
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GH200 480GB
Nvidia driver version: 560.35.03
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Vendor ID: ARM
Model name: Neoverse-V2
Model: 0
Thread(s) per core: 1
Core(s) per socket: 72
Socket(s): 1
Stepping: r0p0
Frequency boost: disabled
CPU max MHz: 3492.0000
CPU min MHz: 81.0000
BogoMIPS: 2000.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
L1d cache: 4.5 MiB (72 instances)
L1i cache: 4.5 MiB (72 instances)
L2 cache: 72 MiB (72 instances)
L3 cache: 114 MiB (1 instance)
NUMA node(s): 9
NUMA node0 CPU(s): 0-71
NUMA node1 CPU(s):
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):
NUMA node8 CPU(s):
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] nvidia-cudnn-frontend==1.5.1
[pip3] nvidia-dali-cuda120==1.39.0
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-modelopt==0.13.0
[pip3] nvidia-nvimgcodec-cu12==0.2.0.7
[pip3] nvidia-pyindex==1.0.9
[pip3] onnx==1.16.0
[pip3] optree==0.12.1
[pip3] pynvml==11.4.1
[pip3] pytorch-triton==3.0.0+989adb9a2
[pip3] pyzmq==26.0.3
[pip3] torch==2.4.0a0+3bcc3cddb5.nv24.7
[pip3] torch-tensorrt==2.5.0a0
[pip3] torchvision==0.19.0a0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.0@COMMIT_HASH_PLACEHOLDER
vLLM Build Flags:
CUDA Archs: 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE NODE 0-71 0 1
NIC0 NODE X PIX
NIC1 NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
🐛 Describe the bug
FastAPI released 0.113.0 about 5 hours ago. This release has a major refactor of Pydantic support. It appears this causes a Pydantic failure with the OpenAI-API calling.
Confirmed that reverting to FastAPI 0.112.2 resolves the problem (pip install fastapi==0.112.2).
Here are logs on the failure:
INFO: 172.16.10.6:40700 - "GET /v1/models HTTP/1.1" 200 OK
INFO: 172.16.10.6:39032 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 291, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
) = await request_body_to_args( # body_params checked above
File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 810, in request_body_to_args
fields_to_extract = get_model_fields(first_field.type_)
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 283, in get_model_fields
return [
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 284, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 109, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 264, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 142, in wrapped
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 102, in _get_schema
schema = gen.generate_schema(type_)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
return self._annotated_schema(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
schema = get_inner_schema(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
lambda source, handler: handler(source)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
items_schema = handler.generate_schema(self.item_source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
return self._generate_schema.generate_schema(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
return self._match_generic_type(obj, origin)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
return self._union_schema(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
choices.append(self.generate_schema(arg))
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
for field_name, annotation in get_type_hints_infer_globalns(
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
File "/usr/lib/python3.10/typing.py", line 694, in _evaluate
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
INFO: 172.16.10.6:39048 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 291, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
) = await request_body_to_args( # body_params checked above
File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 810, in request_body_to_args
fields_to_extract = get_model_fields(first_field.type_)
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 283, in get_model_fields
return [
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 284, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/usr/local/lib/python3.10/dist-packages/fastapi/_compat.py", line 109, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 264, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 142, in wrapped
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
File "/usr/local/lib/python3.10/dist-packages/pydantic/type_adapter.py", line 102, in _get_schema
schema = gen.generate_schema(type_)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
return self._annotated_schema(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
schema = get_inner_schema(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
lambda source, handler: handler(source)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
items_schema = handler.generate_schema(self.item_source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
return self._generate_schema.generate_schema(source_type)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
return self._match_generic_type(obj, origin)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
return self._union_schema(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
choices.append(self.generate_schema(arg))
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
for field_name, annotation in get_type_hints_infer_globalns(
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
File "/usr/lib/python3.10/typing.py", line 694, in _evaluate
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
I believe I was able to find a solution to this. It is related to OpenAI-Python #1454
Not sure why it works with fastapi 0.112.2 but fails in 0.113.0
Problem line:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L286
async def create_chat_completion(request: ChatCompletionRequest,
raw_request: Request):
Confirmed Fix:
async def create_chat_completion(request: Annotated[dict, ChatCompletionRequest],
raw_request: Request):
I'll make a PR on this and reference the issue. Can also add some try/catch with TypeAdapter validation, unless it's seen as unnecessary or impacts performance.
Minimal example of triggering the issue:
Quick guide to running the latest vllm-openai container, upgrading fastapi, and triggering the issue. Also includes instructions to quickly change to editable mode
Pre-requisites:
- Requires system with GPU and ability to pass GPU into docker container (e.g. nvidia-container-toolkit)
Download and start the latest vllm container:
docker run --gpus all -it --rm --network=host --ipc=host --entrypoint /bin/bash vllm/vllm-openai:latest
Working Example with 0.112.2:
Show current fastapi version:
python3 -c "import fastapi; print(fastapi.__version__)"
0.112.2
Start server with small model
python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m
POST to 'v1/chat/completions'
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'
Non-working example after upgrading fastapi:
Upgrade fastapi to 0.113.0 or higher
pip install --upgrade fastapi==0.113.0
Start openai-compatible api_server:
python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m
From outside the container, attempt POST to 'v1/chat/completions':
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'
Dev setup using pre-compiled C-binaries (saves hours of compiling when running pip install -e .):
Start docker container as above.
Install devel packages for Nvidia
apt-get update && apt-get install -y --no-install-recommends libtinfo5 libncursesw5 \
cuda-cudart-dev-12-4=12.4.127-1 \
cuda-command-line-tools-12-4=12.4.1-1 \
cuda-minimal-build-12-4=12.4.1-1 \
cuda-libraries-dev-12-4=12.4.1-1 \
cuda-nvml-dev-12-4=12.4.127-1 \
cuda-nvprof-12-4=12.4.127-1 \
libnpp-dev-12-4=12.2.5.30-1 \
libcusparse-dev-12-4=12.3.1.170-1 \
libcublas-dev-12-4=12.4.5.8-1 \
libnccl2=2.21.5-1+cuda12.4 \
libnccl-dev=2.21.5-1+cuda12.4 \
cuda-nsight-compute-12-4=12.4.1-1
Build vllm editable using precompiled binaries
git clone https://github.com/vllm-project/vllm.git /vllm
cd /vllm
cp /usr/local/lib/python3.10/dist-packages/vllm/*.so /vllm/vllm
VLLM_USE_PRECOMPILED=1 pip install -e .
Run API server
python3 ./vllm/entrypoints/openai/api_server.py --model facebook/opt-125m
Example of an inference request:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"chat_template": "{% if messages[0][\"role\"] == \"system\" %}{{ messages[0][\"content\"] }}\n{% endif %}{% for message in messages[1:] %}{% if message[\"role\"] == \"user\" %}Human: {{ message[\"content\"] }}\n{% elif message[\"role\"] == \"assistant\" %}Assistant: {{ message[\"content\"] }}\n{% endif %}{% endfor %}Assistant:",
"max_tokens": 100
}'
I resolved the issue by downgrading FastAPI to version 0.111.0:
pip install fastapi==0.111.0
For reference, I'm using vllm==0.6.0.
A few things I noticed:
- these issues seem to be related to this change: https://github.com/fastapi/fastapi/commit/aa21814a89853c17c139054a5c51f0bb1ea68a0a (that new
get_model_fields()list comprehension) - this particular issue seems to happen around TypedDicts; from stack trace above:
# ...
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
This may be a red herring, but wondering if there's some weirdness with Required or similar TypedDict hints.
Anyway, smallest reproducible example:
$ pip install vllm==0.6.0 fastapi==0.113.0 pydantic==2.8.2
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:16:23 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:16:23 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
Traceback (most recent call last):
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 283, in get_model_fields
return [
^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 284, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 109, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 264, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 142, in wrapped
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 102, in _get_schema
schema = gen.generate_schema(type_)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
return self._annotated_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
schema = get_inner_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
lambda source, handler: handler(source)
^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
items_schema = handler.generate_schema(self.item_source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
return self._generate_schema.generate_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
return self._match_generic_type(obj, origin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
return self._union_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
choices.append(self.generate_schema(arg))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
for field_name, annotation in get_type_hints_infer_globalns(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 2336, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 371, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 877, in _evaluate
eval(self.__forward_code__, globalns, localns),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
>>>
Note that pydantic==2.9.0 does not have this issue.
$ pip install pydantic==2.9.0
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:26:12 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:26:12 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
[ModelField(field_info=FieldInfo(annotation=List[Union[ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam, ChatCompletionAssistantMessageParam, ChatCompletionToolMessageParam, ChatCompletionFunctionMessageParam, CustomChatCompletionMessageParam]], required=True), name='messages', mode='validation'), ModelField(field_info=FieldInfo(annotation=str, required=True), name='model', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='frequency_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default=None), name='logit_bias', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=0), name='top_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='max_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=1), name='n', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='presence_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[ResponseFormat, NoneType], required=False, default=None), name='response_format', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Ge(ge=-9223372036854775808), Le(le=9223372036854775807)]), name='seed', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, List[str], NoneType], required=False, default_factory=list), name='stop', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='stream', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[StreamOptions, NoneType], required=False, default=None), name='stream_options', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.7), name='temperature', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=1.0), name='top_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[ChatCompletionToolsParam], NoneType], required=False, default=None), name='tools', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Literal['none'], Literal['auto'], ChatCompletionNamedToolChoiceParam, NoneType], required=False, default='none'), name='tool_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='parallel_tool_calls', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None), name='user', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='best_of', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='use_beam_search', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=-1), name='top_k', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=0.0), name='min_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='repetition_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='length_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='early_stopping', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[int], NoneType], required=False, default_factory=list), name='stop_token_ids', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='include_stop_str_in_output', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='ignore_eos', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=0), name='min_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='skip_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='spaces_between_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1)])], NoneType], required=False, default=None), name='truncate_prompt_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='prompt_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, the new message will be prepended with the last message if they belong to the same role.'), name='echo', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True, description='If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.'), name='add_generation_prompt', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).'), name='add_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[Dict[str, str]], NoneType], required=False, default=None, description='A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.'), name='documents', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.'), name='chat_template', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None, description='Additional kwargs to pass to the template renderer. Will be accessible by the chat template.'), name='chat_template_kwargs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, dict, BaseModel, NoneType], required=False, default=None, description='If specified, the output will follow the JSON schema.'), name='guided_json', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the regex pattern.'), name='guided_regex', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None, description='If specified, the output will be exactly one of the choices.'), name='guided_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the context free grammar.'), name='guided_grammar', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description="If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'"), name='guided_decoding_backend', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, will override the default whitespace pattern for guided json decoding.'), name='guided_whitespace_pattern', mode='validation')]
>>>
This makes me feel like this is a pydantic issue? Or at least a confluence of factors across openai / pydantic / fastapi.
Checking @pachewise's code, I was able to reduce the error reproduction to:
from typing_extensions import Annotated
from typing import List
from vllm.entrypoints.chat_utils import (
ChatCompletionMessageParam,
)
from vllm.entrypoints.openai.protocol import ChatCompletionRequest
from pydantic import TypeAdapter
for name, field in ChatCompletionRequest.model_fields.items():
print(name, field)
TypeAdapter(Annotated[List[ChatCompletionMessageParam], field])
That doesn't use FastAPI, it's just Pydantic. And indeed, it's fixed by upgrading Pydantic to 2.9.0. :tada:
It wasn't breaking in FastAPI before because the logic before 0.113.0 wasn't using TypeAdapter yet in that part of the code, and it seems that in the previous version of Pydantic there was a bug in it (not sure where, but it's already solved in 2.9.0).
Glad that it's resolved! Does the issue still occur in FastAPI 0.113.1 with Pydantic 2.8? If so, we may have to update either fastapi or pydantic in our dependencies to make sure that users doesn't install the faulty versions.
@DarkLight1337 yes, I'd recommend fastapi >= 0.114.1 (to fix a performance issue related to this part of their code) and pydantic >= 2.9.0 (to fix the actual issue we're seeing here).
Unfortunately the fastapi bump has broken Ray 2.9 compatibility.
$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3'
... snip...
The conflict is caused by:
vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9"
ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"
I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.
Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?
Unfortunately the fastapi bump has broken Ray 2.9 compatibility.
$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3' ... snip... The conflict is caused by: vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9" ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.
Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?
On it!