Multimodal Bedrock causes error in botocore package

Open OliverThomas2000 opened this issue 9 months ago • 1 comments

[x] This is actually a bug report.
[x] I am not getting good LLM Results
[x] I have tried searching the documentation and have not found an answer.

What Model are you using?

[ ] gpt-3.5-turbo
[ ] gpt-4-turbo
[ ] gpt-4
[x] Other (please specify)

Claude 3 sonnet using AWS Bedrock

Describe the bug When attempting to use bedrock for multimodal requests I am unable to format the messages in a way that satisfies the underlying bedrock client. I've tried all the example message structures in the multimodal example in the docs.

To Reproduce I'm using the bedrock client like so:

bedrock_runtime = boto3.client("bedrock-runtime")
client = instructor.from_bedrock(bedrock_runtime)

My messages list looks something like this:

image =  instructor.Image.autodetect(<the base 64 content>) 
messages = [
{
"role": "user",
"content": [ "find the vrm", image]
}
]

response = client.chat.completions.create(messages, modelId:"<modelId>", response_model=VRMResponse)

If I try this I receive this error:

InstructorRetryException: Parameter validation failed:
Invalid number of parameters set for tagged union structure messages[0].content[0]. Can only set one of the following keys: text, image, document, video, toolUse, toolResult, guardContent.
Unknown parameter in messages[0].content[0]: "type", must be one of: text, image, document, video, toolUse, toolResult, guardContent
Invalid number of parameters set for tagged union structure messages[0].content[1]. Can only set one of the following keys: text, image, document, video, toolUse, toolResult, guardContent.
Unknown parameter in messages[0].content[1]: "type", must be one of: text, image, document, video, toolUse, toolResult, guardContent
Unknown parameter in messages[0].content[1]: "image_url", must be one of: text, image, document, video, toolUse, toolResult, guardContent

The most success I've had was formatting messages like this (closer to the boto3 standard):

messages = [
            {
                "role": "user",
                "content": [
                    {
                        "image": {
                            "data": image,
                            "media_type": "image/png"
                        }
                    },
                    {
                        "text": "Extract the VRM"
                    }
                ]
            }
        ]

This fails with an error (inside botocore):

ParamValidationError                      Traceback (most recent call last)
File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/instructor/retry.py:168, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
    167 hooks.emit_completion_arguments(*args, **kwargs)
--> 168 response = func(*args, **kwargs)
    169 hooks.emit_completion_response(response)

File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:569, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    568 # The "self" in this scope is referring to the BaseClient.
--> 569 return self._make_api_call(operation_name, kwargs)

File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
    979     request_context['endpoint_properties'] = properties
--> 980 request_dict = self._convert_to_request_dict(
    981     api_params=api_params,
    982     operation_model=operation_model,
    983     endpoint_url=endpoint_url,
    984     context=request_context,
    985     headers=additional_headers,
    986 )
    987 resolve_checksum_context(request_dict, operation_model, api_params)

File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:1047, in BaseClient._convert_to_request_dict(self, api_params, operation_model, endpoint_url, context, headers, set_user_agent_header)
   1038 def _convert_to_request_dict(
   1039     self,
...
    203         total_usage=total_usage,
    204     ) from e

InstructorRetryException: Parameter validation failed:
Invalid type for parameter messages[0].content[0].image.source, value: source='iVB....'  media_type='image/png', type: <class 'instructor.multimodal.Image'>, valid types: <class 'dict'>

Expected behavior It's unclear from the docs whether multimodal bedrock is supported. I expected this request to be formatted correctly but clearly there's some missing logic to map to the expected boto3 request format. Either this or I'm incorrectly formatting messages for this client.

If the library doesn't have this functionality, if you're able to point me to the logic that needs adding I'm happy to try and contribute this!

Apr 03 '25 15:04 OliverThomas2000

You need to specify use of BEDROCK_TOOLS: client = instructor.from_bedrock(bedrock_runtime, mode=instructor.Mode.BEDROCK_TOOLS

Alternatively, use the Auto Client Setup:

import instructor

# Auto client with model specification
client = instructor.from_provider("bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")

# The auto client automatically handles:
# - AWS credential detection from environment
# - Region configuration (defaults to us-east-1)
# - Mode selection based on model (Claude models use BEDROCK_TOOLS)

Sep 12 '25 18:09 kelvin-tran