Fabric
Fabric copied to clipboard
[Question]: How to attach images for a vision model in python?
What is your question?
I am making a python program to analyze footage by analyzing evenly spaced frames in a video using a vision model to make a summary for each frame, then generating a summary of all the summaries. Kind of redundant but It's the best workaround I could find, and it works alright. I got the whole thing working using the llava model on Ollama, using base64 to pass the image. It works ok, but for it to work better I need a more powerful vision model. I want to be able to use xAI's API to use grok grok-2-vision-1212 (The API is the same as openAI in theory) and have spent at least 4-5 hours trying to figure it out. Then I had the bright idea to see if Fabric could do it, and it sure could. I guess I could just make the python code run the fabric command, but that would be slow and I really want to make this work. So if anyone knows how the heck fabric passes the image, please let me know so I don't go insane.
And yes I tried using dry run
For Groq you can try to use python openai package, just change base URL for Groq API and model name for
Here is the example:
from openai import OpenAI, AsyncOpenAI
from PIL import Image
import base64
from io import BytesIO
client = AsyncOpenAI(api_key="<Key>", base_url="https://api.groq.com/openai/v1")
def image_to_base64(image_path):
# Open the image using Pillow
with Image.open(image_path) as img:
# Create a buffer to save the image in memory
buffered = BytesIO()
# Save the image to the buffer
img.save(buffered, format=img.format)
# Get the byte data from the buffer
img_bytes = buffered.getvalue()
# Encode the bytes to a Base64 string
base64_string = base64.b64encode(img_bytes).decode('utf-8')
return base64_string
img_base64 = image_to_base64(r"banana1.jpeg")
img_str = f"data:image/jpeg;base64,{img_base64}"
async def make_request():
response = await client.chat.completions.create(
model="llama-3.2-11b-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is on this picture?"},
{"type": "image_url", "image_url": {"url": img_str}},
],
}
],
)
return response.choices[0].message
if __name__ == "__main__":
import asyncio
r = asyncio.run(make_request())
print(r)
This is cool but has nothing to do with Fabric. Closing.