semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Python: Bug: Cannot stream with 3.7 sonnet

Open philippHorn opened this issue 11 months ago • 6 comments

Describe the bug In order to use the latest claude model, I need to use an inference profile as a model id: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html

This works with the converse operation, but does not seem to work with https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html

So right now, if I use an inference profile as a model ID and do streaming, I get this error:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.

But if I use an inference profile id, I get this:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.

It seems like the latest claude model only supports inference profiles. So as a result I see no way to run the latest sonnet model with streaming, or am I missing something?

To Reproduce I can reproduce the issue with this script. If I swap the MODEL_ID with the INFERENCE_PROFILE_ID, the error switches between the two errors I posted above.

import asyncio

import boto3
from django.conf import settings
from semantic_kernel.connectors.ai.anthropic import (
    AnthropicChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.bedrock import BedrockChatCompletion
from semantic_kernel.contents import (
    ChatHistory,
    ChatMessageContent,
    AuthorRole,
    TextContent,
)

AWS_AI_REGION = "us-east-1"
MODEL_ID = "anthropic.claude-3-7-sonnet-20250219-v1:0"
INFERENCE_PROFILE_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"

bedrock_client = boto3.client(
    "bedrock",
    region_name=AWS_AI_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
bedrock_runtime_client = boto3.client(
    "bedrock-runtime",
    region_name=settings.AWS_AI_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)


async def main() -> None:
    sk_client = BedrockChatCompletion(
        model_id=INFERENCE_PROFILE_ID,
        client=bedrock_client,
        runtime_client=bedrock_runtime_client,
    )
    llm_settings = AnthropicChatPromptExecutionSettings(
        temperature=0.2,
    )
    history = ChatHistory(
        messages=[
            ChatMessageContent(role=AuthorRole.USER, items=[TextContent(text="hi")])
        ]
    )
    async for item in sk_client.get_streaming_chat_message_contents(
        history, llm_settings
    ):
        print(item)


if __name__ == "__main__":
    asyncio.run(main())

Expected behavior The script should stream messages

Platform

  • Language: Python
  • Source: semantic-kernel==1.24.0
  • AI model: [e.g. OpenAI:GPT-4o-mini(2024-07-18)]
  • OS: Mac

Note My understanding of AWS is not that deep, I hope what I wrote there is correct and makes sense

philippHorn avatar Mar 12 '25 16:03 philippHorn

@philippHorn is this issue specific to Semantic Kernel's BedrockChatCompletion? Meaning, you still hit there error while directly calling SK's BedrockChatCompletion? Want to make sure I understand it is indeed SK and not something related to AutoGen's abstractions. Thanks.

moonbox3 avatar Mar 13 '25 02:03 moonbox3

@moonbox3 I adapted the script now to not use autogen. It does happen when using BedrockChatCompletion. I'll post the two full tracebacks I get as well:

Using INFERENCE_PROFILE_ID

Traceback (most recent call last):
  File "repro.py", line 64, in <module>
    asyncio.run(main())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "repro.py", line 57, in main
    async for item in sk_client.get_streaming_chat_message_contents(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 249, in get_streaming_chat_message_contents
    async for streaming_chat_message_contents in self._inner_get_streaming_chat_message_contents(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/telemetry/model_diagnostics/decorators.py", line 165, in wrapper_decorator
    async for streaming_chat_message_contents in completion_func(*args, **kwargs):
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 131, in _inner_get_streaming_chat_message_contents
    model_info = await self.get_foundation_model_info(self.ai_model_id)
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_base.py", line 46, in get_foundation_model_info
    response = await run_in_executor(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/async_utils.py", line 11, in run_in_executor
    return await asyncio.get_event_loop().run_in_executor(executor, partial(func, *args, **kwargs))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File ".venv/lib/python3.10/site-packages/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File ".venv/lib/python3.10/site-packages/botocore/client.py", line 1023, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.

Using MODEL_ID:

Traceback (most recent call last):
  File "repro.py", line 64, in <module>
    asyncio.run(main())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "repro.py", line 57, in main
    async for item in sk_client.get_streaming_chat_message_contents(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/chat_completion_client_base.py", line 249, in get_streaming_chat_message_contents
    async for streaming_chat_message_contents in self._inner_get_streaming_chat_message_contents(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/telemetry/model_diagnostics/decorators.py", line 165, in wrapper_decorator
    async for streaming_chat_message_contents in completion_func(*args, **kwargs):
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 140, in _inner_get_streaming_chat_message_contents
    response_stream = await self._async_converse_streaming(**prepared_settings)
  File ".venv/lib/python3.10/site-packages/semantic_kernel/connectors/ai/bedrock/services/bedrock_chat_completion.py", line 286, in _async_converse_streaming
    return await run_in_executor(
  File ".venv/lib/python3.10/site-packages/semantic_kernel/utils/async_utils.py", line 11, in run_in_executor
    return await asyncio.get_event_loop().run_in_executor(executor, partial(func, *args, **kwargs))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File ".venv/lib/python3.10/site-packages/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File ".venv/lib/python3.10/site-packages/botocore/client.py", line 1023, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the ConverseStream operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.

philippHorn avatar Mar 13 '25 14:03 philippHorn

Thanks for your response, @philippHorn. @TaoChenOSU what are your thoughts?

moonbox3 avatar Mar 13 '25 21:03 moonbox3

I think a similar commit to https://github.com/microsoft/semantic-kernel/pull/10859 is needed for python as well.

riywo avatar Apr 01 '25 20:04 riywo

I think a similar commit to #10859 is needed for python as well.

Thanks, @riywo. The handling for Python was done in #10329.

moonbox3 avatar Apr 02 '25 02:04 moonbox3

The problem with this issue is that when asking chat completion with streaming enabled, to verify if the model has that option a call is made to: bedrock_client.get_foundation_model(model_id)

and as the inference profiles are not listed on the foundation models list it returns an error around the model id provided.

Could this be looked into and fixed please.

diogomcsousa avatar Apr 22 '25 12:04 diogomcsousa

Now with 4.0 🙏

dnascimento avatar May 23 '25 06:05 dnascimento