Python: Bug: Cannot stream with 3.7 sonnet
Describe the bug In order to use the latest claude model, I need to use an inference profile as a model id: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html
This works with the converse operation, but does not seem to work with https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html
So right now, if I use an inference profile as a model ID and do streaming, I get this error:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.
But if I use an inference profile id, I get this:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.
It seems like the latest claude model only supports inference profiles. So as a result I see no way to run the latest sonnet model with streaming, or am I missing something?
To Reproduce I can reproduce the issue with this script. If I swap the MODEL_ID with the INFERENCE_PROFILE_ID, the error switches between the two errors I posted above.
import asyncio
import boto3
from django.conf import settings
from semantic_kernel.connectors.ai.anthropic import (
AnthropicChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.bedrock import BedrockChatCompletion
from semantic_kernel.contents import (
ChatHistory,
ChatMessageContent,
AuthorRole,
TextContent,
)
AWS_AI_REGION = "us-east-1"
MODEL_ID = "anthropic.claude-3-7-sonnet-20250219-v1:0"
INFERENCE_PROFILE_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
bedrock_client = boto3.client(
"bedrock",
region_name=AWS_AI_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
bedrock_runtime_client = boto3.client(
"bedrock-runtime",
region_name=settings.AWS_AI_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
async def main() -> None:
sk_client = BedrockChatCompletion(
model_id=INFERENCE_PROFILE_ID,
client=bedrock_client,
runtime_client=bedrock_runtime_client,
)
llm_settings = AnthropicChatPromptExecutionSettings(
temperature=0.2,
)
history = ChatHistory(
messages=[
ChatMessageContent(role=AuthorRole.USER, items=[TextContent(text="hi")])
]
)
async for item in sk_client.get_streaming_chat_message_contents(
history, llm_settings
):
print(item)
if __name__ == "__main__":
asyncio.run(main())
Expected behavior The script should stream messages
Platform
- Language: Python
- Source: semantic-kernel==1.24.0
- AI model: [e.g. OpenAI:GPT-4o-mini(2024-07-18)]
- OS: Mac
Note My understanding of AWS is not that deep, I hope what I wrote there is correct and makes sense
@philippHorn is this issue specific to Semantic Kernel's BedrockChatCompletion? Meaning, you still hit there error while directly calling SK's BedrockChatCompletion? Want to make sure I understand it is indeed SK and not something related to AutoGen's abstractions. Thanks.
@moonbox3 I adapted the script now to not use autogen. It does happen when using BedrockChatCompletion. I'll post the two full tracebacks I get as well:
Using INFERENCE_PROFILE_ID
Traceback (most recent call last):
File "repro.py", line 64, in <module>
asyncio.run(main())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "repro.py", line 57, in main
async for item in sk_client.get_streaming_chat_message_contents(
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 249, in get_streaming_chat_message_contents
async for streaming_chat_message_contents in self._inner_get_streaming_chat_message_contents(
File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/telemetry/model_diagnostics/decorators.py", line 165, in wrapper_decorator
async for streaming_chat_message_contents in completion_func(*args, **kwargs):
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 131, in _inner_get_streaming_chat_message_contents
model_info = await self.get_foundation_model_info(self.ai_model_id)
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_base.py", line 46, in get_foundation_model_info
response = await run_in_executor(
File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/async_utils.py", line 11, in run_in_executor
return await asyncio.get_event_loop().run_in_executor(executor, partial(func, *args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File ".venv/lib/python3.10/site-packages/botocore/client.py", line 569, in _api_call
return self._make_api_call(operation_name, kwargs)
File ".venv/lib/python3.10/site-packages/botocore/client.py", line 1023, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.
Using MODEL_ID:
Traceback (most recent call last):
File "repro.py", line 64, in <module>
asyncio.run(main())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "repro.py", line 57, in main
async for item in sk_client.get_streaming_chat_message_contents(
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 249, in get_streaming_chat_message_contents
async for streaming_chat_message_contents in self._inner_get_streaming_chat_message_contents(
File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/telemetry/model_diagnostics/decorators.py", line 165, in wrapper_decorator
async for streaming_chat_message_contents in completion_func(*args, **kwargs):
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 140, in _inner_get_streaming_chat_message_contents
response_stream = await self._async_converse_streaming(**prepared_settings)
File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 286, in _async_converse_streaming
return await run_in_executor(
File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/async_utils.py", line 11, in run_in_executor
return await asyncio.get_event_loop().run_in_executor(executor, partial(func, *args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File ".venv/lib/python3.10/site-packages/botocore/client.py", line 569, in _api_call
return self._make_api_call(operation_name, kwargs)
File ".venv/lib/python3.10/site-packages/botocore/client.py", line 1023, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the ConverseStream operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.
Thanks for your response, @philippHorn. @TaoChenOSU what are your thoughts?
I think a similar commit to https://github.com/microsoft/semantic-kernel/pull/10859 is needed for python as well.
I think a similar commit to #10859 is needed for python as well.
Thanks, @riywo. The handling for Python was done in #10329.
The problem with this issue is that when asking chat completion with streaming enabled, to verify if the model has that option a call is made to:
bedrock_client.get_foundation_model(model_id)
and as the inference profiles are not listed on the foundation models list it returns an error around the model id provided.
Could this be looked into and fixed please.
Now with 4.0 🙏