mcp-agent
mcp-agent copied to clipboard
Support LLMs through Cloud Vendors
I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.
Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.
I have started working on a PR to add support for AWS Bedrock to the Anthropic Augmented LLM and Azure for the OpenAI one.
Also looking for any thoughts / feedback on this before I got started. Just curious if there was anything in the works / planned to be done a specific way before I took a pass at it. Otherwise I'll have at it.
I'd be grateful for contributions to this! I think adding it similar to the OpenAI and Anthropic support would be superb. I haven't prioritized adding additional providers to focus on building the library capabilities out, so any help here would be appreciated.
One thing that could help is if there are providers that support the OpenAI API format, we could reuse almost the entirety of the OpenAIAugmentedLLM class. We already support specifying a base_url in the config, which is how the Ollama integration works. But it could be more explicitly done as well, so have at it!
Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms
Maybe integrate with this? https://www.litellm.ai/ supports 100+ llms
I think that will be great @hrishikeshio
AWS Sagemaker would great to have. Here is the code I have for Sagemaker client. How can I leverage this code,
def get_streaming_response(self, prompt):
"""
Sends a structured prompt to the SageMaker LLM endpoint and streams the response.
:param prompt: The structured prompt formatted as a JSON list.
:return: The streamed response as a string.
"""
try:
# Define inference parameters with streaming enabled
inference_params = {
"do_sample": True,
"temperature": 0.1,
"top_k": 50,
"max_new_tokens": 512,
"repetition_penalty": 1.03,
"stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
"return_full_text": False
}
body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})
# Invoke SageMaker endpoint with response streaming
response = self.client.invoke_endpoint_with_response_stream(
EndpointName=self.endpoint_name,
Body=body,
ContentType="application/json"
)
event_stream = response["Body"]
return event_stream
Hi @aatish-shinde,
I'm a little confused, are you fine-tuning an LLM with sagemaker?
From initial google searches, invoke_endpoint_with_response_stream seems to be a method on AWS Bedrock client?
- https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/invoke_model_with_response_stream.html
@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.
self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")
# Invoke SageMaker endpoint with response streaming
response = self.client.invoke_endpoint_with_response_stream(
EndpointName=self.endpoint_name,
Body=body,
ContentType="application/json"
)
@saqadri I could use some input.
There are two ways I could possibly take this:
-
An
AWSBedrockAugmentedLLMandAzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client. -
Update the existing
AnthropicAugmentedLLMandOpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication.AnthropicAugmentedLLMwould continue to use the Anthropic library
I guess it really boils down to if the framework wants to support by:
model-AnthropicAugmentedLLMto use Claude via Anthropic directly, AWS Bedrock or Google Vertex.provider- Choose your model based on your provider:AnthropicAugmentedLLM,AWSBedrockAugmentedLLM,GoogleVertexAugmentedLLM
Let me know if that makes sense?
@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.
self.client = boto3.client('sagemaker-runtime', region_name="us-east-2") # Invoke SageMaker endpoint with response streaming response = self.client.invoke_endpoint_with_response_stream( EndpointName=self.endpoint_name, Body=body, ContentType="application/json" )
@aatish-shinde What is self.endpoint_name defined as? What model are you using?
@MattMorgis it is just
@MattMorgis No I am not finetuning. But I think creating bedrock and sagemaker client are the same. You just have mention which aws resource you are calling. You can actually create the client using sagemaker-runtime too. Like this.
self.client = boto3.client('sagemaker-runtime', region_name="us-east-2") # Invoke SageMaker endpoint with response streaming response = self.client.invoke_endpoint_with_response_stream( EndpointName=self.endpoint_name, Body=body, ContentType="application/json" )@aatish-shinde What is
self.endpoint_namedefined as? What model are you using?
@MattMorgis It is just a name. I mean I can name it as "abcd". You usually just create an endpoint with a name in aws Sagemaker and use its boto3 client to instantiate it.
boto3.client('sagemaker-runtime', region_name="us-east-2")
and there is also another way of invoking,
POST /endpoints/EndpointName/invocations-response-stream HTTP/1.1
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html
I think for aws bedrock you do something like this. which is similar.
client = boto3.client(service_name='bedrock-runtime') \n response = client.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body )
@saqadri I could use some input.
There are two ways I could possibly take this:
- An
AWSBedrockAugmentedLLMandAzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.- Update the existing
AnthropicAugmentedLLMandOpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication.AnthropicAugmentedLLMwould continue to use the Anthropic libraryI guess it really boils down to if the framework wants to support by:
model-AnthropicAugmentedLLMto use Claude via Anthropic directly, AWS Bedrock or Google Vertex.provider- Choose your model based on your provider:AnthropicAugmentedLLM,AWSBedrockAugmentedLLM,GoogleVertexAugmentedLLMLet me know if that makes sense?
@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).
So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).
Might be worth taking a look at : https://github.com/evalstate/fast-agent/blob/935e6c627158438c8df488541e63a20802b18720/src/mcp_agent/workflows/llm/model_factory.py#L4
This supports a dot notation for provider.model.reasoning-level
@saqadri I could use some input. There are two ways I could possibly take this:
- An
AWSBedrockAugmentedLLMandAzureOpenAIAugmentedLLM. The Bedrock one for example could expose access to Claude, Llama, Titan, Mistral or any others in Bedrock. It would use boto3 and the bedrock client.- Update the existing
AnthropicAugmentedLLMandOpenAIAugmentedLLM. As you mentioned, they can be re-used completely, the only difference for both is authentication.AnthropicAugmentedLLMwould continue to use the Anthropic libraryI guess it really boils down to if the framework wants to support by:
model-AnthropicAugmentedLLMto use Claude via Anthropic directly, AWS Bedrock or Google Vertex.provider- Choose your model based on your provider:AnthropicAugmentedLLM,AWSBedrockAugmentedLLM,GoogleVertexAugmentedLLMLet me know if that makes sense?
@MattMorgis I thought about this some more, and I think it makes sense to expose it by provider, even if the core implementation may be implementations of the same base classes. Currently, I see providers as supporting multiple models via the same API schema (e.g. openai has a bunch of models available via the same interface, anthropic, together ai has it). Some of the providers have standardized on some of the model provider API schema (e.g. openai-compatible api endpoints from Azure).
So instead of having LLaMAAugmentedLLM, it would be better to choose the provider (e.g. TogetherAugmentedLLM), and specify the model preferences or model ID (just like we do with RequestParams).
Just want to add some input that might be helpful on this topic. The various models in Bedrock do not have a unified API. Claude requires a different input structure than LLama for example. AWS recently addressed this by releasing the converse api which works with most models(I believe it works with any model that supports tool use), but still leaves room for model-specific inference parameters.
I would highly recommend implementing this feature using the converse API since everything is moving in that direction and it will be significantly less work because a single interface could prompt many models. If you use the invoke_model API, it will most likely require a custom implementation for every model.
@MattMorgis I'm very interested in this feature since I can't really use this framework until there is Bedrock support, and my company is about to rewrite one of out agentic apps very soon. I would really like to use this framework for that, but we are required to only use Bedrock. If you need any help or want to piece any of this feature out to speed it up, please let me know.
@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.
@BTripp1986 thanks for this suggestion! I can help prioritize this work to make sure you can use mcp-agent for the agent app rewrite. Please find me on discord (@saqadri) and we can chat through it as well.
Sounds great! Discord request sent from btrippcode
LiteLLM is an excellent choice if we can support that, it opens all the possibilities! Looking forward!
I could help you to test LiteLLM with Azure~
Do we support Claude on Vertex? Does not seem to be supportted in GoogleAugmentedLLM and could not find Vertex related code in AnthropicAugmentedLLM so thinking not? Was confused cause i saw something to the contrary in the benchmarks json file...
Do we support Claude on Vertex? Does not seem to be supportted in
GoogleAugmentedLLMand could not find Vertex related code inAnthropicAugmentedLLMso thinking not? Was confused cause i saw something to the contrary in the benchmarks json file...
Hmm good question @rahuldave. I'll investigate best way to support this and get back to you. Probably need to enable vertex SDK