LitServe icon indicating copy to clipboard operation
LitServe copied to clipboard

feat: Update OpenAI spec to include image url in message content

Open bhimrazy opened this issue 1 year ago • 8 comments

Before submitting
  • [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Did you make sure to update the docs?
  • [ ] Did you write any new necessary tests?

What does this PR do?

Fixes #107.

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

bhimrazy avatar May 23 '24 17:05 bhimrazy

Thanks for the PR @bhimrazy! Can you add a test and a minimal end to end example on the readme?

lantiga avatar May 23 '24 17:05 lantiga

@bhimrazy sick! super excited to try this.

@lantiga @lantiga do we have a guide or something to show how to add the test and example?

williamFalcon avatar May 23 '24 18:05 williamFalcon

no, good point /cc @aniketmaurya

@bhimrazy for now you can take inspiration from:

  • https://github.com/Lightning-AI/LitServe/blob/main/tests/test_specs.py for unit tests
  • https://github.com/Lightning-AI/LitServe/blob/main/tests/e2e/test_e2e.py#L90 for the end to end test
  • https://github.com/Lightning-AI/LitServe/blob/main/README.md?plain=1#L548 for the README examples

lantiga avatar May 23 '24 19:05 lantiga

thank you for the PR @bhimrazy! as Luca mentioned, you can take inspiration from the existing LitSpec test cases.

Maybe you can try sending the request with image content to the server and check that it is able to parse and doesn't break.

{
 "model": "lit",
  "messages": [
     {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
  ]
}

aniketmaurya avatar May 23 '24 19:05 aniketmaurya

Thanks, @lantiga, @williamFalcon, and @aniketmaurya for all of these feedback. I will go through the given examples and add the test cases and e2e example.

bhimrazy avatar May 23 '24 19:05 bhimrazy

awesome @bhimrazy!! don't hesitate to reach out if you need any help.

aniketmaurya avatar May 23 '24 19:05 aniketmaurya

Hi @aniketmaurya @lantiga , Could you please help me through the addition of end-to-end documentation to the README? I have prepared a draft version of it, included below.


LitServe's OpenAISpec also enables capability to handle images in the input. Below is an example of how to set this up using LitServe.

import litserve as ls
from litserve.specs.openai import ChatMessage

class OpenAISpecLitAPI(ls.LitAPI):
    def setup(self, device):
        self.model = None

    def predict(self, x):
        yield {"role": "assistant", "content": "This is a generated output"}

    def encode_response(self, output: dict) -> ChatMessage:
        yield ChatMessage(role="assistant", content="This is a custom encoded output")


if __name__ == "__main__":
    server = ls.LitServer(OpenAISpecLitAPI(), spec=ls.OpenAISpec())
    server.run(port=8000)

In this case, predict is expected to take an input with the following shape:

  • Text Input Example:

    [{"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": "Hello there"},
     {"role": "assistant", "content": "Hello, how can I help?"},
     {"role": "user", "content": "What is the capital of Australia?"}]
    
  • Mixed Text and Image Input Example:

    [{"role": "system", "content": "You are a helpful assistant."},
     {
     "role": "user", 
     "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                ]
    
    },
     {"role": "assistant", "content": "A wooden boardwalk through a green field under a blue sky."},
     {"role": "user", "content": "How is the weather depicted in the image?"}]
    

The above server can be queried using a standard OpenAI client:

import requests

response = requests.post("http://127.0.0.1:8000/v1/chat/completions", json={
    "model": "my-gpt2",
    "stream": False,  # You can stream chunked response by setting this True
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ]
      }
    ]
  })

bhimrazy avatar May 24 '24 09:05 bhimrazy

looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:

    def predict(self, x):
        if isinstance(x["content"], list):
                # do something with image url
                image_url = x["content"][1]["image_url"]
                yield {"role": "assistant", "content": "the image describes nature and bla bla..."}
        else:
            yield {"role": "assistant", "content": "This is a generated output."}

aniketmaurya avatar May 24 '24 10:05 aniketmaurya

looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:

    def predict(self, x):
        if isinstance(x["content"], list):
                # do something with image url
                image_url = x["content"][1]["image_url"]
                yield {"role": "assistant", "content": "the image describes nature and bla bla..."}
        else:
            yield {"role": "assistant", "content": "This is a generated output."}

Sure Thanks!

bhimrazy avatar May 24 '24 11:05 bhimrazy

Hi @lantiga, The PR is ready for review. Thank you!

bhimrazy avatar May 24 '24 17:05 bhimrazy

Awesome @bhimrazy, reviewing now!

lantiga avatar May 24 '24 20:05 lantiga

Awesome job @bhimrazy, let's see what CI thinks and then we're ready to merge!

lantiga avatar May 24 '24 20:05 lantiga

Let's goo! Merged 🚀

lantiga avatar May 24 '24 20:05 lantiga

congrats @bhimrazy! solid contribution

williamFalcon avatar May 25 '24 00:05 williamFalcon