boto3
boto3 copied to clipboard
Incorrect input token counting for Llama 3.2 with image inputs
Describe the bug
The boto3 bedrock-runtime client appears to incorrectly report input token counts for Llama 3.2 models when processing images. While other Bedrock models correctly report varying token counts based on image dimensions, Llama 3.2 consistently returns the same small token count regardless of image size.
Expected Behavior
Input token counts for Llama 3.2 should increase proportionally with image size, similar to other multimodal models like Claude and Nova Lite. The token count should reflect the actual computational cost of processing the image.
Current Behavior
Llama 3.2 consistently reports only 36 input tokens regardless of image dimensions (tested with both 512×512 and 1120×1120 resolution images). Other models correctly report larger token counts for larger images:
| Model | Image Size | Input Tokens |
|---|---|---|
| Claude Sonnet 3.7 | (512, 512) | 371 |
| Nova Lite | (512, 512) | 1314 |
| Llama 3.2 11B | (512, 512) | 36 |
| Claude Sonnet 3.7 | (1120, 1120) | 1531 |
| Nova Lite | (1120, 1120) | 2610 |
| Llama 3.2 11B | (1120, 1120) | 36 |
This may lead to incorrect cost calculations and usage monitoring for Llama 3.2.
Reproduction Steps
Run this code:
# %pip install boto3 pandas pillow
import io
import boto3
import pandas as pd
from PIL import Image
import httpx
# Initialize the Bedrock Runtime client
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')
image_sizes = [512, 1120]
results = []
for image_size in image_sizes:
# web image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
response = httpx.get(image_url)
image = Image.open(io.BytesIO(response.content))
format = image.format
# Resize image
size = (image_size, image_size)
image = image.resize(size, Image.Resampling.LANCZOS)
# Convert image to bytes
buffer = io.BytesIO()
image.save(buffer, format=format)
image_bytes = buffer.getvalue()
models = {
"Claude Sonnet 3.7": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"Nova Lite": "amazon.nova-lite-v1:0",
"Llama 3.2 11B": "us.meta.llama3-2-11b-instruct-v1:0",
}
# Create the message structure with just the image
messages = [
{
"role": "user",
"content": [
# {
# "text": "What do you see in this image?" # Your question here
# },
{
"image": {
"format": "jpeg",
"source": {"bytes": image_bytes}
}
},
]
}
]
# Run each model
for model_name, model_id in models.items():
# Make the API call
response = bedrock_client.converse(
modelId=model_id,
messages=messages,
)
ai_response = response['output']['message']['content'][0]['text']
# Extract input tokens
input_tokens = response['usage']['inputTokens']
# Add to results
results.append({
"Model": model_name,
"Image Size": size,
"Input Tokens": input_tokens,
#"AI Response": ai_response,
})
df = pd.DataFrame(results)
print(df)
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.37.36
Environment details (OS name and version, etc.)
MacOS