mcp-agent Please help. How can integrate MCP with model deployed on AWS Sagemaker? Much appreciate your help.

I have a model deployed on AWS Sagemaker and would like to use MCP with that model. possible?

Mar 13 '25 14:03 aatish-shinde

Hi @aatish-shinde thank you for your question! This is related to #40, which @MattMorgis is working on.

If the API is openai-compatible, then you can also set the base_url endpoint to the instance and use that (though I don't think SageMaker in particular exposes that). Please see this example of using with Ollama models: https://github.com/lastmile-ai/mcp-agent/blob/main/examples/mcp_basic_ollama_agent/mcp_agent.config.yaml#L24.

Mar 13 '25 15:03 saqadri

@saqadri Thanks for the reply. I already have a Sagemaker client class using boto3 lib. I would like to use this class to get response from LLM. Any idea if that possible? It looks like this,

class SageMakerClient:
    def __init__(self):
        self.logger = logger
        self.logger.info("Initializing SageMakerClient...")
        self.client = boto3.client('sagemaker-runtime', region_name="us-east-2")
        self.endpoint_name = SAGEMAKER_ENDPOINT
        if not self.endpoint_name:
            logger.warning("⚠️ SageMaker endpoint is not configured. Set SAGEMAKER_ENDPOINT in .env.")

    def get_streaming_response(self, prompt):
        """
        Sends a structured prompt to the SageMaker LLM endpoint and streams the response.

        :param prompt: The structured prompt formatted as a JSON list.
        :return: The streamed response as a string.
        """
        try:
            # Define inference parameters with streaming enabled
            inference_params = {
                "do_sample": True,
                "temperature": 0.1,
                "top_k": 50,
                "max_new_tokens": 512,
                "repetition_penalty": 1.03,
                "stop": ["</s>", "<|system|>", "<|user|>", "<|assistant|>"],
                "return_full_text": False
            }

            body = json.dumps({"inputs": prompt, "parameters": inference_params, "stream": True})

            # Invoke SageMaker endpoint with response streaming
            response = self.client.invoke_endpoint_with_response_stream(
                EndpointName=self.endpoint_name,
                Body=body,
                ContentType="application/json"
            )

            event_stream = response["Body"]

            return event_stream

        except Exception as e:
            self.logger.error(f"🚨 SageMaker error: {e}")
            return "Error processing request. Please try again."

Mar 13 '25 15:03 aatish-shinde

@MattMorgis do you have an update on supporting AWS hosted models? Or generally any suggestions for @aatish-shinde. I can look into this tomorrow @aatish-shinde

Mar 13 '25 18:03 saqadri

@saqadri Having this same convo over in #40

Mar 13 '25 19:03 mattmorgis