boto3 Incorrect input token counting for Llama 3.2 with image inputs

Incorrect input token counting for Llama 3.2 with image inputs

Open austinmw opened this issue 6 months ago • 3 comments

Describe the bug

The boto3 bedrock-runtime client appears to incorrectly report input token counts for Llama 3.2 models when processing images. While other Bedrock models correctly report varying token counts based on image dimensions, Llama 3.2 consistently returns the same small token count regardless of image size.

Expected Behavior

Input token counts for Llama 3.2 should increase proportionally with image size, similar to other multimodal models like Claude and Nova Lite. The token count should reflect the actual computational cost of processing the image.

Current Behavior

Llama 3.2 consistently reports only 36 input tokens regardless of image dimensions (tested with both 512×512 and 1120×1120 resolution images). Other models correctly report larger token counts for larger images:

Model	Image Size	Input Tokens
Claude Sonnet 3.7	(512, 512)	371
Nova Lite	(512, 512)	1314
Llama 3.2 11B	(512, 512)	36
Claude Sonnet 3.7	(1120, 1120)	1531
Nova Lite	(1120, 1120)	2610
Llama 3.2 11B	(1120, 1120)	36

This may lead to incorrect cost calculations and usage monitoring for Llama 3.2.

Reproduction Steps

Run this code:

# %pip install boto3 pandas pillow

import io
import boto3
import pandas as pd
from PIL import Image
import httpx

# Initialize the Bedrock Runtime client
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

image_sizes = [512, 1120]
results = []

for image_size in image_sizes:

    # web image
    image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
    response = httpx.get(image_url)
    image = Image.open(io.BytesIO(response.content))
    format = image.format

    # Resize image
    size = (image_size, image_size)
    image = image.resize(size, Image.Resampling.LANCZOS)

    # Convert image to bytes
    buffer = io.BytesIO()
    image.save(buffer, format=format)
    image_bytes = buffer.getvalue()

    models = {
        "Claude Sonnet 3.7": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
        "Nova Lite": "amazon.nova-lite-v1:0",
        "Llama 3.2 11B": "us.meta.llama3-2-11b-instruct-v1:0",
    }

    # Create the message structure with just the image
    messages = [
        {
            "role": "user",
            "content": [
                # {
                #     "text": "What do you see in this image?"  # Your question here
                # },
                {
                    "image": {
                        "format": "jpeg",
                        "source": {"bytes": image_bytes}
                    }
                },
            ]
        }
    ]

    # Run each model
    for model_name, model_id in models.items():

        # Make the API call
        response = bedrock_client.converse(
            modelId=model_id,
            messages=messages,
        )

        ai_response = response['output']['message']['content'][0]['text']

        # Extract input tokens
        input_tokens = response['usage']['inputTokens']
        
        # Add to results
        results.append({
            "Model": model_name,
            "Image Size": size,
            "Input Tokens": input_tokens,
            #"AI Response": ai_response,
        })


df = pd.DataFrame(results)
print(df)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.37.36

Environment details (OS name and version, etc.)

MacOS

Apr 18 '25 18:04 austinmw

boto3 boto3 copied to clipboard

Incorrect input token counting for Llama 3.2 with image inputs

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

boto3
boto3 copied to clipboard