feat: Update OpenAI spec to include image url in message content
Before submitting
- [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
What does this PR do?
Fixes #107.
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃
Thanks for the PR @bhimrazy! Can you add a test and a minimal end to end example on the readme?
@bhimrazy sick! super excited to try this.
@lantiga @lantiga do we have a guide or something to show how to add the test and example?
no, good point /cc @aniketmaurya
@bhimrazy for now you can take inspiration from:
- https://github.com/Lightning-AI/LitServe/blob/main/tests/test_specs.py for unit tests
- https://github.com/Lightning-AI/LitServe/blob/main/tests/e2e/test_e2e.py#L90 for the end to end test
- https://github.com/Lightning-AI/LitServe/blob/main/README.md?plain=1#L548 for the README examples
thank you for the PR @bhimrazy! as Luca mentioned, you can take inspiration from the existing LitSpec test cases.
Maybe you can try sending the request with image content to the server and check that it is able to parse and doesn't break.
{
"model": "lit",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
]
}
Thanks, @lantiga, @williamFalcon, and @aniketmaurya for all of these feedback. I will go through the given examples and add the test cases and e2e example.
awesome @bhimrazy!! don't hesitate to reach out if you need any help.
Hi @aniketmaurya @lantiga , Could you please help me through the addition of end-to-end documentation to the README? I have prepared a draft version of it, included below.
LitServe's OpenAISpec also enables capability to handle images in the input. Below is an example of how to set this up using LitServe.
import litserve as ls
from litserve.specs.openai import ChatMessage
class OpenAISpecLitAPI(ls.LitAPI):
def setup(self, device):
self.model = None
def predict(self, x):
yield {"role": "assistant", "content": "This is a generated output"}
def encode_response(self, output: dict) -> ChatMessage:
yield ChatMessage(role="assistant", content="This is a custom encoded output")
if __name__ == "__main__":
server = ls.LitServer(OpenAISpecLitAPI(), spec=ls.OpenAISpec())
server.run(port=8000)
In this case, predict is expected to take an input with the following shape:
-
Text Input Example:
[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello there"}, {"role": "assistant", "content": "Hello, how can I help?"}, {"role": "user", "content": "What is the capital of Australia?"}] -
Mixed Text and Image Input Example:
[{"role": "system", "content": "You are a helpful assistant."}, { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg", }, ] }, {"role": "assistant", "content": "A wooden boardwalk through a green field under a blue sky."}, {"role": "user", "content": "How is the weather depicted in the image?"}]
The above server can be queried using a standard OpenAI client:
import requests
response = requests.post("http://127.0.0.1:8000/v1/chat/completions", json={
"model": "my-gpt2",
"stream": False, # You can stream chunked response by setting this True
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
]
}
]
})
looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:
def predict(self, x):
if isinstance(x["content"], list):
# do something with image url
image_url = x["content"][1]["image_url"]
yield {"role": "assistant", "content": "the image describes nature and bla bla..."}
else:
yield {"role": "assistant", "content": "This is a generated output."}
looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:
def predict(self, x): if isinstance(x["content"], list): # do something with image url image_url = x["content"][1]["image_url"] yield {"role": "assistant", "content": "the image describes nature and bla bla..."} else: yield {"role": "assistant", "content": "This is a generated output."}
Sure Thanks!
Hi @lantiga, The PR is ready for review. Thank you!
Awesome @bhimrazy, reviewing now!
Awesome job @bhimrazy, let's see what CI thinks and then we're ready to merge!
Let's goo! Merged 🚀
congrats @bhimrazy! solid contribution