llama-stack
llama-stack copied to clipboard
vllm does not work with image URLs
System Info
...
Information
- [ ] The official example scripts
- [ ] My own modified scripts
🐛 Describe the bug
vLLM does not work when you just pass image URLs
see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/vllm/vllm.py#L166
if you change download=False, it does not work.
How to test?
# run VLLM first
docker run --rm -it -e HUGGING_FACE_HUB_TOKEN=... \
-v /home/ashwin/.cache/huggingface:/root/.cache/huggingface \
vllm/vllm-openai:latest \
--trust-remote-code \
--gpu-memory-utilization 0.75 \
--model meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager \
--max-model-len 4096 --max-num-seqs 16 --port 6001
pytest -v -s -k vllm tests/inference/test_vision_inference.py \
--env VLLM_URL=http://localhost:6001/v1
Error logs
On the vLLM logs, you see
INFO: 127.0.0.1:59652 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR 12-16 23:56:17 serving_chat.py:162] Error in loading multi-modal data
ERROR 12-16 23:56:17 serving_chat.py:162] Traceback (most recent call last):
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 663, in _request
ERROR 12-16 23:56:17 serving_chat.py:162] conn = await self._connector.connect(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 563, in connect
ERROR 12-16 23:56:17 serving_chat.py:162] proto = await self._create_connection(req, traces, timeout)
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1032, in _create_connection
ERROR 12-16 23:56:17 serving_chat.py:162] _, proto = await self._create_direct_connection(req, traces, timeout)
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1335, in _create_direct_connection
ERROR 12-16 23:56:17 serving_chat.py:162] transp, proto = await self._wrap_create_connection(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/connector.py", line 1091, in _wrap_create_connection
ERROR 12-16 23:56:17 serving_chat.py:162] sock = await aiohappyeyeballs.start_connection(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/impl.py", line 89, in start_connection
ERROR 12-16 23:56:17 serving_chat.py:162] sock, _, _ = await _staggered.staggered_race(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/_staggered.py", line 160, in staggered_race
ERROR 12-16 23:56:17 serving_chat.py:162] done = await _wait_one(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohappyeyeballs/_staggered.py", line 41, in _wait_one
ERROR 12-16 23:56:17 serving_chat.py:162] return await wait_next
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] asyncio.exceptions.CancelledError
ERROR 12-16 23:56:17 serving_chat.py:162]
ERROR 12-16 23:56:17 serving_chat.py:162] The above exception was the direct cause of the following exception:
ERROR 12-16 23:56:17 serving_chat.py:162]
ERROR 12-16 23:56:17 serving_chat.py:162] Traceback (most recent call last):
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 160, in create_chat_completion
ERROR 12-16 23:56:17 serving_chat.py:162] mm_data = await mm_data_future
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 235, in all_mm_data
ERROR 12-16 23:56:17 serving_chat.py:162] items = await asyncio.gather(*self._items)
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/utils.py", line 140, in async_get_and_parse_image
ERROR 12-16 23:56:17 serving_chat.py:162] image = await async_fetch_image(image_url)
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/utils.py", line 62, in async_fetch_image
ERROR 12-16 23:56:17 serving_chat.py:162] image_raw = await global_http_connection.async_get_bytes(
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/vllm/connections.py", line 92, in async_get_bytes
ERROR 12-16 23:56:17 serving_chat.py:162] async with await self.get_async_response(url, timeout=timeout) as r:
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 1359, in __aenter__
ERROR 12-16 23:56:17 serving_chat.py:162] self._resp: _RetType = await self._coro
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^^^^^^^^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/client.py", line 579, in _request
ERROR 12-16 23:56:17 serving_chat.py:162] with timer:
ERROR 12-16 23:56:17 serving_chat.py:162] ^^^^^
ERROR 12-16 23:56:17 serving_chat.py:162] File "/usr/local/lib/python3.12/dist-packages/aiohttp/helpers.py", line 749, in __exit__
ERROR 12-16 23:56:17 serving_chat.py:162] raise asyncio.TimeoutError from exc_val
Expected behavior
Should have worked as per the documentation of vLLM? I also tried setting VLLM_IMAGE_FETCH_TIMEOUT=20 when starting the vLLM server.