vllm
vllm copied to clipboard
[Frontend] re-enable multi-modality input in the new beam search implementation
Changes in this PR:
This PR introduces the following changes based on the updated beam search implementation:
- Re-enable multi-modality input: Support for multi-modality input has been re-enabled for beam search with OpenAI-compatible endpoints.
- Logprobs handling in
ChatCompletionRequest
: Added additional validation to disable logprobs
whenuse_beam_search=True
. Since the beam search selects results based oncumulative logprobs
and determines step logprobs bybeam_width
, it ignores thetop_logprobs
andlogprobs
parameters passed in with the request.
Unit Test
Added two additional test cases in tests/entrypoints/openai/test_vision.py
.
Manual Testing
The following command was used to launch the server for manual testing: vllm serve microsoft/Phi-3.5-vision-instruct --api-key token-abc123 --trust-remote-code --max-model-len 4096 --limit-mm-per-prompt image=2
Client script used to test the changes:
import openai
import asyncio
url = "http://localhost:"
client = openai.AsyncOpenAI(
base_url = "http://localhost:8000/v1",
api_key="token-abc123"
)
# Image URLs
img_urls = [
"https://upload.wikimedia.org/wikipedia/commons/c/cb/Brachiosaurus_DB_flipped.jpg",
"https://upload.wikimedia.org/wikipedia/commons/3/3d/Allosaurus_Revised.jpg"
]
# Define the messages for the chat completion
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": img_urls[0]
}
},
{
"type": "image_url",
"image_url": {
"url": img_urls[1]
}
},
{
"type": "text",
"text": "what are the animals in the images?"
}
]
}
]
async def make_request():
try:
response = await client.chat.completions.create(
model="microsoft/Phi-3.5-vision-instruct",
max_tokens=32,
temperature=0,
messages=messages,
n=2,
extra_body={"use_beam_search": True}
)
for choice in response.choices:
print(choice.message.content)
except openai.BadRequestError as e:
print(f"Error: {e.code}")
asyncio.run(make_request())
Verified the functionality of multi-image input handling and correct response generation using beam search with the above manual tests.