Fabric icon indicating copy to clipboard operation
Fabric copied to clipboard

[Question]: How to attach images for a vision model in python?

Open rmc507 opened this issue 11 months ago • 2 comments

What is your question?

I am making a python program to analyze footage by analyzing evenly spaced frames in a video using a vision model to make a summary for each frame, then generating a summary of all the summaries. Kind of redundant but It's the best workaround I could find, and it works alright. I got the whole thing working using the llava model on Ollama, using base64 to pass the image. It works ok, but for it to work better I need a more powerful vision model. I want to be able to use xAI's API to use grok grok-2-vision-1212 (The API is the same as openAI in theory) and have spent at least 4-5 hours trying to figure it out. Then I had the bright idea to see if Fabric could do it, and it sure could. I guess I could just make the python code run the fabric command, but that would be slow and I really want to make this work. So if anyone knows how the heck fabric passes the image, please let me know so I don't go insane.

rmc507 avatar Jan 01 '25 07:01 rmc507

And yes I tried using dry run image_2025-01-01_003933852

rmc507 avatar Jan 01 '25 07:01 rmc507

For Groq you can try to use python openai package, just change base URL for Groq API and model name for and then default code for sending image base64 encoded. I just saw that you ment the other model Grok, but I have tested that on Groq, so the process is the same.
Here is the example:

from openai import OpenAI, AsyncOpenAI
from PIL import Image
import base64
from io import BytesIO

client = AsyncOpenAI(api_key="<Key>", base_url="https://api.groq.com/openai/v1")

def image_to_base64(image_path):
    # Open the image using Pillow
    with Image.open(image_path) as img:
        # Create a buffer to save the image in memory
        buffered = BytesIO()
        # Save the image to the buffer
        img.save(buffered, format=img.format)
        # Get the byte data from the buffer
        img_bytes = buffered.getvalue()
    # Encode the bytes to a Base64 string
    base64_string = base64.b64encode(img_bytes).decode('utf-8')
    return base64_string

img_base64 = image_to_base64(r"banana1.jpeg")
img_str = f"data:image/jpeg;base64,{img_base64}"

async def make_request():
    response = await client.chat.completions.create(
                model="llama-3.2-11b-vision-preview",
                messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is on this picture?"},
                    {"type": "image_url", "image_url": {"url": img_str}},
                ],
            }
        ],
    )

    return response.choices[0].message

if __name__ == "__main__": 
    import asyncio

    r = asyncio.run(make_request())
    print(r)

Lumberj3ck avatar Jan 04 '25 20:01 Lumberj3ck

This is cool but has nothing to do with Fabric. Closing.

ksylvan avatar Jul 06 '25 10:07 ksylvan