mcp-agent icon indicating copy to clipboard operation
mcp-agent copied to clipboard

Support LLMs through Cloud Vendors

Open mattmorgis opened this issue 9 months ago • 18 comments

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

mattmorgis avatar Mar 05 '25 20:03 mattmorgis

I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.

Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.

I'd be grateful for contributions to this! I think adding it similar to the OpenAI and Anthropic support would be superb. I haven't prioritized adding additional providers to focus on building the library capabilities out, so any help here would be appreciated.

One thing that could help is if there are providers that support the OpenAI API format, we could reuse almost the entirety of the OpenAIAugmentedLLM class. We already support specifying a base_url in the config, which is how the Ollama integration works. But it could be more explicitly done as well, so have at it!

saqadri avatar Mar 05 '25 21:03 saqadri

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

hrishikeshio avatar Mar 06 '25 11:03 hrishikeshio

Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms

I think that will be great @hrishikeshio

saqadri avatar Mar 06 '25 18:03 saqadri

AWS Sagemaker would great to have. Here is the code I have for Sagemaker client. How can I leverage this code,

def get_streaming_response(self, prompt):
        """
        Sends a structured prompt to the SageMaker LLM endpoint and streams the response.

        :param prompt: The structured prompt formatted as a JSON list.
        :return: The streamed response as a string.
        """
        try:
            # Define inference parameters with streaming enabled
            inference_params = {
                "do_sample": True,
                "temperature": 0.1,
                "top_k": 50,
                "max_new_tokens": 512,
                "repetition_penalty": 1.03,
                "stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
                "return_full_text": False
            }

            body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})

            # Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

            event_stream = response["Body"]

            return event_stream

aatish-shinde avatar Mar 13 '25 18:03 aatish-shinde

Hi @aatish-shinde,

I'm a little confused, are you fine-tuning an LLM with sagemaker?

From initial google searches, invoke_endpoint_with_response_stream seems to be a method on AWS Bedrock client?

  • https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html
  • https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/invoke_model_with_response_stream.html

mattmorgis avatar Mar 13 '25 18:03 mattmorgis

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

aatish-shinde avatar Mar 13 '25 19:03 aatish-shinde

@saqadri I could use some input.

There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.

  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

mattmorgis avatar Mar 13 '25 19:03 mattmorgis

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )

@aatish-shinde What is self.endpoint_name defined as? What model are you using?

mattmorgis avatar Mar 13 '25 19:03 mattmorgis

@MattMorgis it is just

@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.

self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")

# Invoke SageMaker endpoint with response streaming
        response = self.client.invoke_endpoint_with_response_stream(
            EndpointName=self.endpoint_name,
            Body=body,
            ContentType="application/json"
        )

@aatish-shinde What is self.endpoint_name defined as? What model are you using?

@MattMorgis It is just a name. I mean I can name it as "abcd". You usually just create an endpoint with a name in aws Sagemaker and use its boto3 client to instantiate it. boto3.client('sagemaker-runtime', region_name="us-east-2") and there is also another way of invoking, POST /endpoints/EndpointName/invocations-response-stream HTTP/1.1

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html

I think for aws bedrock you do something like this. which is similar. client = boto3.client(service_name='bedrock-runtime') \n response = client.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body )

aatish-shinde avatar Mar 13 '25 19:03 aatish-shinde

@saqadri I could use some input.

There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.
  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

saqadri avatar Mar 14 '25 20:03 saqadri

Might be worth taking a look at : https://github.com/evalstate/fast-agent/blob/935e6c627158438c8df488541e63a20802b18720/src/mcp_agent/workflows/llm/model_factory.py#L4

This supports a dot notation for provider.model.reasoning-level

evalstate avatar Mar 15 '25 14:03 evalstate

@saqadri I could use some input. There are two ways I could possibly take this:

  1. An AWSBedrockAugmentedLLM and AzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.
  2. Update the existing AnthropicAugmentedLLM and OpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication. AnthropicAugmentedLLM would continue to use the Anthropic library

I guess it really boils down to if the framework wants to support by:

  • model - AnthropicAugmentedLLM to use Claude via Anthropic directly, AWS Bedrock or Google Vertex.
  • provider - Choose your model based on your provider:AnthropicAugmentedLLM, AWSBedrockAugmentedLLM, GoogleVertexAugmentedLLM

Let me know if that makes sense?

@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).

So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).

Just want to add some input that might be helpful on this topic. The various models in Bedrock do not have a unified API. Claude requires a different input structure than LLama for example. AWS recently addressed this by releasing the converse api which works with most models(I believe it works with any model that supports tool use), but still leaves room for model-specific inference parameters.

I would highly recommend implementing this feature using the converse API since everything is moving in that direction and it will be significantly less work because a single interface could prompt many models. If you use the invoke_model API, it will most likely require a custom implementation for every model.

@MattMorgis I'm very interested in this feature since I can't really use this framework until there is Bedrock support, and my company is about to rewrite one of out agentic apps very soon. I would really like to use this framework for that, but we are required to only use Bedrock. If you need any help or want to piece any of this feature out to speed it up, please let me know.

BTripp1986 avatar Mar 24 '25 22:03 BTripp1986

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

saqadri avatar Mar 25 '25 03:03 saqadri

@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.

Sounds great! Discord request sent from btrippcode

BTripp1986 avatar Mar 25 '25 10:03 BTripp1986

LiteLLM is an excellent choice if we can support that, it opens all the possibilities! Looking forward!

rwang1987 avatar Mar 26 '25 18:03 rwang1987

I could help you to test LiteLLM with Azure~

rwang1987 avatar Mar 26 '25 18:03 rwang1987

Do we support Claude on Vertex? Does not seem to be supportted in GoogleAugmentedLLM and could not find Vertex related code in AnthropicAugmentedLLM so thinking not? Was confused cause i saw something to the contrary in the benchmarks json file...

rahuldave avatar Jun 27 '25 13:06 rahuldave

Do we support Claude on Vertex? Does not seem to be supportted in GoogleAugmentedLLM and could not find Vertex related code in AnthropicAugmentedLLM so thinking not? Was confused cause i saw something to the contrary in the benchmarks json file...

Hmm good question @rahuldave. I'll investigate best way to support this and get back to you. Probably need to enable vertex SDK

saqadri avatar Jun 27 '25 15:06 saqadri