Multiple input images/text messages get grouped into one inline item and fail to render
Description
Expectation: I am sending multiple images as per the docs, I'd expect for them to render correctly in Logfire and also to show any text sent inbetween the images, eg, surround tags like <image1>...</image1>
Reality: For specific images sent as BinaryContent, they will collapse all images and text into one inline blob (both in the visual rendering, but also in the httpx recorded request payload - see screenshots below).
Failing input structure
prompt = [
"Please describe each garment separately.",
"<garment1_image>",
garment1_image,
"</garment1_image>\n<garment2_image>",
garment2_image,
"</garment2_image>"
]
Logfire shows:
Request:
Findings so far
- It works for most images, but a few fail it like
Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg - If you provide it as
ImageUrlit works - I tried checking the bytes, and if it's really JPEG, all seems correct
- It fails regardless of whether it's the first or second image (no ordering difference)
- If you swap this failing image, it will work -- or if you change this failing image into ImageUrl and keep the rest as BinaryContent, it will work as expected
- It happens on many providers (Gemini, OpenAI)
Reproducible Example
- download these two files and update the garment1 and garment2 image paths in the code below
https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpghttps://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg - run the below script
- observe the captured logs when images are sent as binary content -- it's clumped into a single inline input
#!/usr/bin/env python3
"""
Minimal Working Example: Failing Logfire/PydanticAI with dual image upload.
Split into two runs: one with URLs and one with BinaryContent. BinaryContent creates issues (seems to send only one blob).
Download these images locally to test "load_images_as_binary"
https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg
https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg
"""
import os
from pydantic_ai import Agent, BinaryContent, ImageUrl
from pydantic_ai.models.gemini import GeminiModel
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
import logfire
logfire.configure(
token=os.getenv("LOGFIRE_TOKEN"), environment="development", scrubbing=False
)
logfire.instrument_httpx(capture_request_body=True, capture_response_body=True)
logfire.instrument_pydantic_ai()
def load_images_as_urls():
"""Utility to load images as URLs."""
garment1_image_url = "https://cdn11.bigcommerce.com/s-2fhihzl616/images/stencil/1920w/products/6943/27593/Yellow-Linen-Short-Sleeve-Shirt-MS364HS-Model__79148.1711385764.jpg"
garment2_image_url = "https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg"
garment1_image = ImageUrl(garment1_image_url)
garment2_image = ImageUrl(garment2_image_url)
return garment1_image, garment2_image
def load_images_as_binary():
"""Utility to load images as binary content."""
garment1_image_path = "/Users/<user>/Downloads/img1.jpg"
garment2_image_path = "/Users/<user>/Downloads/img2.jpg"
# Load both images as BinaryContent
garment1_image = BinaryContent(data=open(garment1_image_path, "rb").read(), media_type="image/jpeg")
garment2_image = BinaryContent(data=open(garment2_image_path, "rb").read(), media_type="image/jpeg")
return garment1_image, garment2_image
def main():
"""Run both test scenarios - only the image loading method changes."""
# Initialize the agent (same for both tests)
agent = Agent(
model=GeminiModel(
model_name="gemini-2.5-flash-lite",
)
)
# Test 1: URLs
print("=== TESTING WITH URLS ===")
garment1_image, garment2_image = load_images_as_urls()
# Create prompt with both images (same structure for both tests)
prompt = [
"Please describe each garment separately.",
"<garment1_image>",
garment1_image,
"</garment1_image>\n<garment2_image>",
garment2_image,
"</garment2_image>"
]
# Send to PydanticAI (same for both tests)
response = agent.run_sync(prompt)
print("-" * 50)
print("URL RESULTS:")
print(response.output)
print("-" * 50)
# Test 2: Binary Content (only image loading changes)
print("\n=== TESTING WITH BINARY CONTENT ===")
garment1_image, garment2_image = load_images_as_binary()
# Same prompt structure
prompt = [
"Please describe each garment separately.",
"<garment1_image>",
garment1_image,
"</garment1_image>\n<garment2_image>",
garment2_image,
"</garment2_image>"
]
# Same agent call
response = agent.run_sync(prompt)
print("-" * 50)
print("BINARY CONTENT RESULTS:")
print(response.output)
print("-" * 50)
if __name__ == "__main__":
main()
Python, Logfire & OS Versions, related packages (not required)
>>> import logfire; print(logfire.logfire_info())
logfire="4.0.0"
platform="macOS-15.5-arm64-arm-64bit"
python="3.12.10 (main, May 22 2025, 01:38:44) [Clang 20.1.4 ]"
[related_packages]
requests="2.32.4"
pydantic="2.11.7"
fastapi="0.116.1"
openai="1.97.1"
protobuf="6.31.1"
rich="14.0.0"
executing="2.2.0"
opentelemetry-api="1.35.0"
opentelemetry-exporter-otlp-proto-common="1.35.0"
opentelemetry-exporter-otlp-proto-http="1.35.0"
opentelemetry-instrumentation="0.56b0"
opentelemetry-instrumentation-httpx="0.56b0"
opentelemetry-proto="1.35.0"
opentelemetry-sdk="1.35.0"
opentelemetry-semantic-conventions="0.56b0"
opentelemetry-util-http="0.56b0"
Side note: Previously, I was seeing bad AI responses (completely hallucinated), so I assumed it was a PydanticAI bug (given the httpx payloads). I couldn't reproduce that behavior in the minified example below, but there are several differences from my original pipeline:
- I was using
system_promptanddepsfor the agent - I was using structured outputs (
output_typefield)
Not sure if it's relevant, but providing for completeness.
This seems like a pydantic-ai bug, I don't believe that your screenshot of http.request.body.text somehow isn't what was actually sent.
This seems like a pydantic-ai bug, I don't believe that your screenshot of
http.request.body.textsomehow isn't what was actually sent.
Confused me too, but I get this text as an LLM reply:
=== TESTING WITH BINARY CONTENT === 14:35:36.180 agent run 14:35:36.180 chat gemini-2.5-flash-lite 14:35:36.181 POST generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent 14:35:38.289 Reading response body
BINARY CONTENT RESULTS: Here's a description of each garment:
Garment 1:
This is a short-sleeved, collared button-down shirt made of a lightweight, breathable fabric, likely linen or a linen blend, given its slightly crinkled texture. It's a cheerful pale yellow color. The shirt features a classic button-up front, a single chest pocket on the left side, and short sleeves that hit around the bicep. It has a relaxed, casual fit.
Garment 2:
This is a short-sleeved t-shirt with a classic crew neck. It features a horizontal stripe pattern in alternating shades of off-white or cream and a light peachy-beige or tan. The stripes have a slightly textured or hand-painted appearance, giving the t-shirt a casual and artistic feel. The fabric appears to be a soft cotton or cotton blend.
That describes both garments. I wonder if HTTPX has some optimization under the hood to collapse things?
oh, here's the problem:
Halfway through the base64 image_url is a .... The part before that comes from the first image. The part after is from the second image. The JSON text itself is truncated before it's parsed.
If you use the smaller file twice (https://www.brooktaverner.co.uk/media/catalog/product/cache/80bf7bdf2d49ba613ba3da30cc9a5879/w/i/willis_4342c_0982_rt_web_grey.jpg) then it works. The other file is too big for our current system.
Thank you for debugging so quickly! Will it be addressed by some UI updates or should I just expect it as is?
User impact was 3 hours hunting down a bug in my code and rewriting my pipelines several times because logfire kept telling me only 1 image is coming through (even when I checked httpx payload).
I wouldn't have expected truncation to change the httpx payload structure to look like this (and hide the text in between the images):
Imagine the attribute looks like this:
[
{
"text": "<image1>"
},
{
"data": "AAAAAAAAAAAAAAAAAAAAAA"
},
{
"text": "</image1>"
},
{
"text": "<image2>"
},
{
"data": "BBBBBBBBBBBBBBBBBBBBBBB"
},
{
"text": "</image2>"
}
]
but the data values are much longer. When it reaches our backend for the first time, it gets truncated, because we can't deal with arbitrarily large span attributes. Think of the whole thing as a big string, not as a parsed JSON object. Truncation keeps the first and last N characters. That reduces it to this:
[
{
"text": "<image1>"
},
{
"data": "AAAAAAAAAA...BBBBBBBBBBBBBB"
},
{
"text": "</image2>"
}
]
We do want to make truncation smarter, it's just not obvious what's the best thing to do for arbitrary data structures.
We also want to build 'blob storage' which would store big values like images separately from everything else, reducing the need for truncation. This would especially work well in the pydantic-ai instrumentation where we can explicitly mark certain things as 'blobs'.
Note that if you send only the big image, it still gets truncated, but now the only thing cut is part of the base64 data so the full JSON structure is intact, which is what the current crude strategy is meant to achieve at least sometimes.
Understood! Thank you for explanation!
Given that PydanticAI parts have a clearly defined structure, I hope that in the future it can get preferential treatment to truncate within the message parts, not the whole blob! 😃 🤞
I'll leave the issue opened. It could help others who run into the same situation.