Multimodal Bedrock causes error in botocore package
- [x] This is actually a bug report.
- [x] I am not getting good LLM Results
- [x] I have tried searching the documentation and have not found an answer.
What Model are you using?
- [ ] gpt-3.5-turbo
- [ ] gpt-4-turbo
- [ ] gpt-4
- [x] Other (please specify)
Claude 3 sonnet using AWS Bedrock
Describe the bug When attempting to use bedrock for multimodal requests I am unable to format the messages in a way that satisfies the underlying bedrock client. I've tried all the example message structures in the multimodal example in the docs.
To Reproduce I'm using the bedrock client like so:
bedrock_runtime = boto3.client("bedrock-runtime")
client = instructor.from_bedrock(bedrock_runtime)
My messages list looks something like this:
image = instructor.Image.autodetect(<the base 64 content>)
messages = [
{
"role": "user",
"content": [ "find the vrm", image]
}
]
response = client.chat.completions.create(messages, modelId:"<modelId>", response_model=VRMResponse)
If I try this I receive this error:
InstructorRetryException: Parameter validation failed:
Invalid number of parameters set for tagged union structure messages[0].content[0]. Can only set one of the following keys: text, image, document, video, toolUse, toolResult, guardContent.
Unknown parameter in messages[0].content[0]: "type", must be one of: text, image, document, video, toolUse, toolResult, guardContent
Invalid number of parameters set for tagged union structure messages[0].content[1]. Can only set one of the following keys: text, image, document, video, toolUse, toolResult, guardContent.
Unknown parameter in messages[0].content[1]: "type", must be one of: text, image, document, video, toolUse, toolResult, guardContent
Unknown parameter in messages[0].content[1]: "image_url", must be one of: text, image, document, video, toolUse, toolResult, guardContent
The most success I've had was formatting messages like this (closer to the boto3 standard):
messages = [
{
"role": "user",
"content": [
{
"image": {
"data": image,
"media_type": "image/png"
}
},
{
"text": "Extract the VRM"
}
]
}
]
This fails with an error (inside botocore):
ParamValidationError Traceback (most recent call last)
File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/instructor/retry.py:168, in retry_sync(func, response_model, args, kwargs, context, max_retries, strict, mode, hooks)
167 hooks.emit_completion_arguments(*args, **kwargs)
--> 168 response = func(*args, **kwargs)
169 hooks.emit_completion_response(response)
File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:569, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
568 # The "self" in this scope is referring to the BaseClient.
--> 569 return self._make_api_call(operation_name, kwargs)
File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
979 request_context['endpoint_properties'] = properties
--> 980 request_dict = self._convert_to_request_dict(
981 api_params=api_params,
982 operation_model=operation_model,
983 endpoint_url=endpoint_url,
984 context=request_context,
985 headers=additional_headers,
986 )
987 resolve_checksum_context(request_dict, operation_model, api_params)
File ~/PythonCode/ANPR_deploy/.venv/lib/python3.12/site-packages/botocore/client.py:1047, in BaseClient._convert_to_request_dict(self, api_params, operation_model, endpoint_url, context, headers, set_user_agent_header)
1038 def _convert_to_request_dict(
1039 self,
...
203 total_usage=total_usage,
204 ) from e
InstructorRetryException: Parameter validation failed:
Invalid type for parameter messages[0].content[0].image.source, value: source='iVB....' media_type='image/png', type: <class 'instructor.multimodal.Image'>, valid types: <class 'dict'>
Expected behavior It's unclear from the docs whether multimodal bedrock is supported. I expected this request to be formatted correctly but clearly there's some missing logic to map to the expected boto3 request format. Either this or I'm incorrectly formatting messages for this client.
If the library doesn't have this functionality, if you're able to point me to the logic that needs adding I'm happy to try and contribute this!
You need to specify use of BEDROCK_TOOLS: client = instructor.from_bedrock(bedrock_runtime, mode=instructor.Mode.BEDROCK_TOOLS
Alternatively, use the Auto Client Setup:
import instructor
# Auto client with model specification
client = instructor.from_provider("bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
# The auto client automatically handles:
# - AWS credential detection from environment
# - Region configuration (defaults to us-east-1)
# - Mode selection based on model (Claude models use BEDROCK_TOOLS)