GPT 4o introduces a new message type that contains images and coded as either URL or base64 encoded.

example:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

https://platform.openai.com/docs/guides/vision

Milestone 1

Vision support in instrumentations python for llama-index, openai, gemini, and langchain
Eliminate performance degradations from base64 encoded payloads by allowing users to opt out
Preliminary set of config flags to mask input output that could be sensitive info
Create examples

Milestone N

image synthesis apis such as DALL-E

Tracing

[x] #522
[x] #523
[x] #539
[ ] #562
[x] #582
[x] #538
[ ] [vision] [javascript] langchain image messages parsing
[x] #557
[x] #560
[ ] [multi-modal] scope out video / audio semantic conventions
[x] #567

Instrumenation

Testing

[x] #872

Image tracing

[x] #707
[x] #708
[x] #709
[x] #710
[ ] #711
[x] #631
[x] #712
[x] #713
[x] #714
[x] #715
[x] #716
[x] #717

Context Attributes

[x] #718
[x] #719
[x] #720
[x] #721
[x] #722
[x] #723
[x] #724
[x] #725
[x] #726
[x] #727
[x] #728
[x] #729

Config

[x] #730
[x] #731
[x] #733
[x] #732
[x] #734
[x] #633
[x] #737
[x] #632
[x] #736
[x] #634
[x] #635
[x] #735

Suppress Tracing

[x] #748
[x] #749

UI / Javascript

[x] #568
[x] #704
[x] #821
[x] #956
[ ] [vision] instrumentation for langchain-js

Testing

[ ] #558

Documentation

[x] #561
[x] #786
[x] #787
[ ] #788
[ ] #833

Evals

[ ] #574

May 23 '24 08:05 mikeldking

Example vLLM client that should also support vision

class VLMClient:
    def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL):
        self._vlm_model = vlm_model
        self._vllm_client = httpx.AsyncClient(base_url=vllm_url)

        if VLLM_HEALTHCHECK:
            wait_for_ready(
                server_url=vllm_url,
                wait_seconds=VLLM_READY_TIMEOUT,
                health_endpoint="health",
            )

    @property
    def vlm_model(self) -> str:
        return self._vlm_model

    async def __call__(
        self,
        prompt: str,
        image_bytes: bytes | None = None,
        image_filetype: filetype.Type | None = None,
        max_tokens: int = 10,
    ) -> str:
        # Assemble the message content
        message_content: list[dict[str, str | dict]] = [
            {
                "type": "text",
                "text": prompt,
            }
        ]

        if image_bytes is not None:
            if image_filetype is None:
                image_filetype = filetype.guess(image_bytes)

            if image_filetype is None:
                raise ValueError("Could not determine image filetype")

            if image_filetype not in ALLOWED_IMAGE_TYPES:
                raise ValueError(
                    f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}"
                )

            image_b64 = base64.b64encode(image_bytes).decode("utf-8")
            message_content.append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{image_filetype.mime};base64,{image_b64}",
                    },
                }
            )

        # Put together the request payload
        payload = {
            "model": self.vlm_model,
            "messages": [{"role": "user", "content": message_content}],
            "max_tokens": max_tokens,
            # "logprobs": True,
            # "top_logprobs": 1,
        }

        response = await self._vllm_client.post("/v1/chat/completions", json=payload)
        response = response.json()
        response_text: str = (
            response.get("choices")[0].get("message", {}).get("content", "").strip()
        )

        return response_text

May 23 '24 15:05 mikeldking

Closing as completed as images is complete. Audio will come as part of openAI realtime instrumentation

Dec 06 '24 00:12 mikeldking

openinference
openinference copied to clipboard

🗺️ Vision / multi-modal

Milestone 1

Milestone N

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals

openinference openinference copied to clipboard

🗺️ Vision / multi-modal

Milestone 1

Milestone N

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals

openinference
openinference copied to clipboard