extensions icon indicating copy to clipboard operation
extensions copied to clipboard

Create representations for hosted MCP servers and content

Open stephentoub opened this issue 10 months ago • 12 comments

AI services are starting to support server-side use of MCP servers. For example, Anthropic's service: https://docs.anthropic.com/en/docs/agents-and-tools/mcp-connector and OpenAI responses: https://platform.openai.com/docs/guides/tools-remote-mcp

With the ModelContextProtocol library, it's easy to use any MCP server, including stdio ones, locally, treating every tool as an AIFunction, but there's not currently an abstraction for the hosted MCP server case, where the service uses the server directly.

  • [ ] Rough sketch of how we could handle this well:
    1. [x] Add a HostedMcpServer : AITool. This would be configurable with all the common stuff: server url, optional list of allowed tool names, indication of whether to allow auto invocation, etc.
    2. [x] Add a HostedMcpServerToolCall : AIContent to represent the call the service makes to the server (including tool name and arguments) and similarly a HostedMcpServerToolResult to represent the result of the operation.
    3. [x] The MCP spec recommends human in the loop on tool calls, and the OpenAI design defaults to not automatically invoking tools; interestingly Anthropic's doesn't, but I suspect that's coming. We might also need an AIContent to represent approval/denial for the server invoking a server-side tool.
    4. [ ] For IChatClient that don't have this capability, we can enable it via an McpServerChatClient : IChatClient that itself uses MCP clients. It would translate a HostedMcpServer tool into creating an McpClient and replacing the tool in the tool collection with the appropriate McpClientTool instances. With a FunctionInvokingChatClient in the pipeline, it would enable then similar automatic handling of MCP Server interactions.
    5. [x] Update FunctionInvokingChatClient to support automatic approval.

Additional details originally captured in Follow-ups for MCP tool (#6779):

  • [ ] https://github.com/dotnet/extensions/pull/6664 adds the MCP tool and supporting types as experimental. Some things to follow-up on before marking them stable:
    1. [ ] Do we need the base "user input" / "user output" types?
    2. [x] The OpenAI Responses implementation currently looks for the tool approval as part of a Tool message. Is that right? Should it be User instead? Should it be any kind of message?
    3. [ ] We need to be stricter about which roles can contain approval responses, only user roles should be allowed, this applies to both MCP and FICC. We can reconsider supporting other roles in the future https://github.com/dotnet/extensions/pull/6881#discussion_r2419557556.
    4. [x] Responses exposes a single MCP tool call message / instance, rather than a separate one for call and a separate one for result. We instead currently model it as two types, one for request and one for result. Is that the right split?
    5. [ ] The naming of "id" parameters is confusing; it's not clear which ID they're referring to, and more than once I passed in the wrong ID. We should revisit the naming.
    6. [x] We need to ensure the hosted MCP tool works with variations, like OpenAI's connectors. This might entail replacing the Url property with something more general, possibly just a rename.
    7. [x] In McpServerToolCallContent, it is too harsh to require Server name since it is only useful to disambiguate in cases of tool name collisions. We need to revisit which properties are required, at the bare minimum we should keep CallId.
    8. [ ] Consider replacing Mcp approval types with Function approval ones: https://github.com/dotnet/extensions/issues/6492#issuecomment-3499102536.
    9. [ ] Consider introducing a type for representing Server-side calls to avoid needing to introduce specialized contents for everything of this class e.g. OpenAPI and MCP: https://github.com/dotnet/extensions/issues/6492#issuecomment-3499160008.

stephentoub avatar Jun 04 '25 02:06 stephentoub

Do we need the base "user input" / "user output" types?

Aside from what I said in https://github.com/dotnet/extensions/issues/6779#issuecomment-3271400229, I'm not opposed to remove them, I that could also alleviate 4. "The naming of "id" parameters is confusing".

Responses exposes a single MCP tool call message / instance, rather than a separate one for call and a separate one for result. We instead currently model it as two types, one for request and one for result. Is that the right split?

I resonate with Peder on https://github.com/dotnet/extensions/issues/6779#issuecomment-3308670694 and I don't think we need to change it. I will strike this one if you all agree.

jozkee avatar Oct 08 '25 23:10 jozkee

Do we need the base "user input" / "user output" types?

We need them. Only debate is naming I think.

stephentoub avatar Oct 08 '25 23:10 stephentoub

I think we should also consider renaming the McpServerToolXx types to be HostedMcpServerToolXx, e.g. McpServerToolCallContent => HostedMcpServerToolCallContent. That way it's clear they're directly associated with the HostedMcpServerTool, and differentiates from other MCP-related things. We'd do the same when we add call/result content for other hosted tool types.

stephentoub avatar Oct 17 '25 21:10 stephentoub

@stephentoub yes, that's what we had originally.

We'd do the same when we add call/result content for other hosted tool types.

Do you have something concrete that wouldn't fit nicely with the current naming?

jozkee avatar Oct 20 '25 15:10 jozkee

Do you have something concrete that wouldn't fit nicely with the current naming?

I'm not sure what you mean.

When we add call/result types for HostedCodeInterpreterTool, HostedWebSearchTool, etc., what will they be called? e.g. WebSearchToolCallContent or HostedWebSearchToolCallContent.

stephentoub avatar Oct 20 '25 15:10 stephentoub

Do you have something concrete that wouldn't fit nicely with the current naming?

I'm not sure what you mean.

I somehow missed this.  I was thinking you had new tools in your radar that would be in a similar naming concern. The existing tools you mentioned are also valid examples and answered my question, thanks.

jozkee avatar Oct 25 '25 02:10 jozkee

@jozkee, we should also consider whether McpServerToolApprovalRequestContent and McpServerToolApprovalResponseContent are really necessary. Could we instead just use FunctionApprovalRequestContent/FunctionApprovalResponseContent, with the Call property just typed as AIContent instead of strongly-typed to FunctionCallContent, such that they could then be used for any call approval? If in the future there are other server-side tools that require approval, it'd be nice if we didn't have to create new xxApprovalRequest/ResponseContent types for each.

cc: @westey-m, @eavanvalkenburg

stephentoub avatar Nov 06 '25 19:11 stephentoub

@jozkee, we should think more as well about the representation of server-side calls. It'd be nice if we didn't need to keep introducing types for different kinds of server-side function invocations, e.g. MCP vs OpenAPI vs something else. Could we come up with some base type or something that represents the majority case and we'd only need to derive for additional info, or just use AdditionalProperties?

stephentoub avatar Nov 06 '25 19:11 stephentoub

I think using FunctionApprovalRequestContent/FunctionApprovalResponseContent would be the right way to do for MCP Tools. Is this working right now? I was playing around with wrapping the Tools from my MCP Client into a ApprovalRequiredAiFunction, but this does nothing. Am I right from what I read here, that approval for MCP calls is not supported yet?

daschuchmann avatar Nov 15 '25 20:11 daschuchmann

I was playing around with wrapping the Tools from my MCP Client into a ApprovalRequiredAiFunction, but this does nothing. Am I right from what I read here, that approval for MCP calls is not supported yet?

@daschuchmann, it is supported, for remote MCP and MCP clients providing them as tools in ChatOptions, the latter sounds like what you are doing:

using Microsoft.Extensions.AI;
using ModelContextProtocol.Client;
using OpenAI;

#pragma warning disable OPENAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
#pragma warning disable MEAI001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.

var chatClient = new OpenAIClient(Environment.GetEnvironmentVariable("OPENAI_KEY"))
    .GetOpenAIResponseClient("gpt-4o-mini")
    .AsIChatClient()
    .AsBuilder()
    .UseFunctionInvocation()
    .Build();

await using var mcpClient = await McpClient.CreateAsync(new StdioClientTransport(new()
{
    Command = "npx",
    Arguments = ["-y", "@modelcontextprotocol/server-everything"],
}));

IList<McpClientTool> tools = await mcpClient.ListToolsAsync();
List<AITool> functionApprovalTools = new(tools.Count);

foreach (var tool in tools)
{
    Console.WriteLine($"Tool: {tool.Name}, Description: {tool.Description}");
    functionApprovalTools.Add(new ApprovalRequiredAIFunction(tool)); // Wrap tools to require approval.
}

List<ChatMessage> messages = new()
{
    new ChatMessage(ChatRole.User, "Use the echo tool to echo 'Hello world'"),
};

ChatResponse response = await chatClient.GetResponseAsync(messages, new() { Tools = [.. functionApprovalTools] });

var approvalRequest = response.Messages
    .SelectMany(m => m.Contents)
    .OfType<FunctionApprovalRequestContent>()
    .First(); // I expect only one approval request in this scenario.

Console.WriteLine($"Creating approval response for {approvalRequest.FunctionCall.Name}.");
messages.Add(new ChatMessage(ChatRole.User, [approvalRequest.CreateResponse(approved: true)]));

Console.WriteLine(await chatClient.GetResponseAsync(messages, new() { Tools = [.. functionApprovalTools] }));

jozkee avatar Nov 16 '25 01:11 jozkee

@jozkee this worked for me, thank you for the example code. I had a small issue in my code but was close :)

daschuchmann avatar Nov 18 '25 17:11 daschuchmann

we should think more as well about the representation of server-side calls.

@stephentoub, I think is reasonable to add ServerSideContent : AIContent that can be extended by MCP and CodeInterpreter call and result contents, and can be leveraged by @javiercn's AGUIChatClient and can remove the workaround of hiding FCCs from FICC.

ServerSideContent would probably just have a CallId property.

jozkee avatar Dec 02 '25 20:12 jozkee