logfire icon indicating copy to clipboard operation
logfire copied to clipboard

Multiple input images/text messages get grouped into one inline item and fail to render

Open svilupp opened this issue 5 months ago • 10 comments

Description

Expectation: I am sending multiple images as per the docs, I'd expect for them to render correctly in Logfire and also to show any text sent inbetween the images, eg, surround tags like <image1>...</image1>

Reality: For specific images sent as BinaryContent, they will collapse all images and text into one inline blob (both in the visual rendering, but also in the httpx recorded request payload - see screenshots below).

Failing input structure

 prompt = [
        "Please describe each garment separately.",
        "<garment1_image>",
        garment1_image,
        "</garment1_image>\n<garment2_image>",
        garment2_image,
        "</garment2_image>"
    ]

Logfire shows:

Image

Request:

Image

Findings so far

  • It works for most images, but a few fail it like Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg
  • If you provide it as ImageUrl it works
  • I tried checking the bytes, and if it's really JPEG, all seems correct
  • It fails regardless of whether it's the first or second image (no ordering difference)
  • If you swap this failing image, it will work -- or if you change this failing image into ImageUrl and keep the rest as BinaryContent, it will work as expected
  • It happens on many providers (Gemini, OpenAI)

Reproducible Example

  • download these two files and update the garment1 and garment2 image paths in the code below https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg
  • run the below script
  • observe the captured logs when images are sent as binary content -- it's clumped into a single inline input
#!/usr/bin/env python3
"""
Minimal Working Example: Failing Logfire/PydanticAI with dual image upload.
Split into two runs: one with URLs and one with BinaryContent. BinaryContent creates issues (seems to send only one blob).

Download these images locally to test "load_images_as_binary"
https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg
https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg

"""

import os
from pydantic_ai import Agent, BinaryContent, ImageUrl
from pydantic_ai.models.gemini import GeminiModel
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

import logfire
logfire.configure(
    token=os.getenv("LOGFIRE_TOKEN"), environment="development", scrubbing=False
)
logfire.instrument_httpx(capture_request_body=True, capture_response_body=True)
logfire.instrument_pydantic_ai()


def load_images_as_urls():
    """Utility to load images as URLs."""
    garment1_image_url = "https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg"
    garment2_image_url = "https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg"
    
    garment1_image = ImageUrl(garment1_image_url)
    garment2_image = ImageUrl(garment2_image_url)
    
    return garment1_image, garment2_image


def load_images_as_binary():
    """Utility to load images as binary content."""
    garment1_image_path = "/Users/<user>/Downloads/img1.jpg"
    garment2_image_path = "/Users/<user>/Downloads/img2.jpg"
    
    # Load both images as BinaryContent
    garment1_image = BinaryContent(data=open(garment1_image_path, "rb").read(), media_type="image/jpeg")
    garment2_image = BinaryContent(data=open(garment2_image_path, "rb").read(), media_type="image/jpeg")
    
    return garment1_image, garment2_image


def main():
    """Run both test scenarios - only the image loading method changes."""
    
    # Initialize the agent (same for both tests)
    agent = Agent(
        model=GeminiModel(
            model_name="gemini-2.5-flash-lite",
        )
    )
    
    # Test 1: URLs
    print("=== TESTING WITH URLS ===")
    garment1_image, garment2_image = load_images_as_urls()
    
    # Create prompt with both images (same structure for both tests)
    prompt = [
        "Please describe each garment separately.",
        "<garment1_image>",
        garment1_image,
        "</garment1_image>\n<garment2_image>",
        garment2_image,
        "</garment2_image>"
    ]
    
    # Send to PydanticAI (same for both tests)
    response = agent.run_sync(prompt)
    
    print("-" * 50)
    print("URL RESULTS:")
    print(response.output)
    print("-" * 50)
    
    # Test 2: Binary Content (only image loading changes)
    print("\n=== TESTING WITH BINARY CONTENT ===")
    garment1_image, garment2_image = load_images_as_binary()
    
    # Same prompt structure
    prompt = [
        "Please describe each garment separately.",
        "<garment1_image>",
        garment1_image,
        "</garment1_image>\n<garment2_image>",
        garment2_image,
        "</garment2_image>"
    ]
    
    # Same agent call
    response = agent.run_sync(prompt)
    
    print("-" * 50)
    print("BINARY CONTENT RESULTS:")
    print(response.output)
    print("-" * 50)


if __name__ == "__main__":
    main()

Python, Logfire & OS Versions, related packages (not required)

>>> import logfire; print(logfire.logfire_info())
logfire="4.0.0"
platform="macOS-15.5-arm64-arm-64bit"
python="3.12.10 (main, May 22 2025, 01:38:44) [Clang 20.1.4 ]"
[related_packages]
requests="2.32.4"
pydantic="2.11.7"
fastapi="0.116.1"
openai="1.97.1"
protobuf="6.31.1"
rich="14.0.0"
executing="2.2.0"
opentelemetry-api="1.35.0"
opentelemetry-exporter-otlp-proto-common="1.35.0"
opentelemetry-exporter-otlp-proto-http="1.35.0"
opentelemetry-instrumentation="0.56b0"
opentelemetry-instrumentation-httpx="0.56b0"
opentelemetry-proto="1.35.0"
opentelemetry-sdk="1.35.0"
opentelemetry-semantic-conventions="0.56b0"
opentelemetry-util-http="0.56b0"

svilupp avatar Jul 28 '25 13:07 svilupp

Side note: Previously, I was seeing bad AI responses (completely hallucinated), so I assumed it was a PydanticAI bug (given the httpx payloads). I couldn't reproduce that behavior in the minified example below, but there are several differences from my original pipeline:

  • I was using system_prompt and deps for the agent
  • I was using structured outputs (output_type field)

Not sure if it's relevant, but providing for completeness.

svilupp avatar Jul 28 '25 13:07 svilupp

This seems like a pydantic-ai bug, I don't believe that your screenshot of http.request.body.text somehow isn't what was actually sent.

alexmojaki avatar Jul 28 '25 13:07 alexmojaki

This seems like a pydantic-ai bug, I don't believe that your screenshot of http.request.body.text somehow isn't what was actually sent.

Confused me too, but I get this text as an LLM reply:

=== TESTING WITH BINARY CONTENT === 14:35:36.180 agent run 14:35:36.180 chat gemini-2.5-flash-lite 14:35:36.181 POST generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent 14:35:38.289 Reading response body

BINARY CONTENT RESULTS: Here's a description of each garment:

Garment 1:

This is a short-sleeved, collared button-down shirt made of a lightweight, breathable fabric, likely linen or a linen blend, given its slightly crinkled texture. It's a cheerful pale yellow color. The shirt features a classic button-up front, a single chest pocket on the left side, and short sleeves that hit around the bicep. It has a relaxed, casual fit.

Garment 2:

This is a short-sleeved t-shirt with a classic crew neck. It features a horizontal stripe pattern in alternating shades of off-white or cream and a light peachy-beige or tan. The stripes have a slightly textured or hand-painted appearance, giving the t-shirt a casual and artistic feel. The fabric appears to be a soft cotton or cotton blend.

That describes both garments. I wonder if HTTPX has some optimization under the hood to collapse things?

svilupp avatar Jul 28 '25 13:07 svilupp

oh, here's the problem:

Image

alexmojaki avatar Jul 28 '25 14:07 alexmojaki

Halfway through the base64 image_url is a .... The part before that comes from the first image. The part after is from the second image. The JSON text itself is truncated before it's parsed.

alexmojaki avatar Jul 28 '25 14:07 alexmojaki

If you use the smaller file twice (https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg) then it works. The other file is too big for our current system.

alexmojaki avatar Jul 28 '25 14:07 alexmojaki

Thank you for debugging so quickly! Will it be addressed by some UI updates or should I just expect it as is?

User impact was 3 hours hunting down a bug in my code and rewriting my pipelines several times because logfire kept telling me only 1 image is coming through (even when I checked httpx payload).

I wouldn't have expected truncation to change the httpx payload structure to look like this (and hide the text in between the images):

Image

svilupp avatar Jul 28 '25 16:07 svilupp

Imagine the attribute looks like this:

[
  {
    "text": "<image1>"
  },
  {
    "data": "AAAAAAAAAAAAAAAAAAAAAA"
  },
  {
    "text": "</image1>"
  },
  {
    "text": "<image2>"
  },
  {
    "data": "BBBBBBBBBBBBBBBBBBBBBBB"
  },
  {
    "text": "</image2>"
  }
]

but the data values are much longer. When it reaches our backend for the first time, it gets truncated, because we can't deal with arbitrarily large span attributes. Think of the whole thing as a big string, not as a parsed JSON object. Truncation keeps the first and last N characters. That reduces it to this:

[
  {
    "text": "<image1>"
  },
  {
    "data": "AAAAAAAAAA...BBBBBBBBBBBBBB"
  },
  {
    "text": "</image2>"
  }
]

We do want to make truncation smarter, it's just not obvious what's the best thing to do for arbitrary data structures.

We also want to build 'blob storage' which would store big values like images separately from everything else, reducing the need for truncation. This would especially work well in the pydantic-ai instrumentation where we can explicitly mark certain things as 'blobs'.

alexmojaki avatar Jul 28 '25 16:07 alexmojaki

Note that if you send only the big image, it still gets truncated, but now the only thing cut is part of the base64 data so the full JSON structure is intact, which is what the current crude strategy is meant to achieve at least sometimes.

alexmojaki avatar Jul 28 '25 17:07 alexmojaki

Understood! Thank you for explanation!

Given that PydanticAI parts have a clearly defined structure, I hope that in the future it can get preferential treatment to truncate within the message parts, not the whole blob! 😃 🤞

I'll leave the issue opened. It could help others who run into the same situation.

svilupp avatar Jul 28 '25 18:07 svilupp