modelcontextprotocol
modelcontextprotocol copied to clipboard
RFC: Client / Server Content capabilities
This PR adds a new contentTypes capability to the ClientCapabilities and a generatesHint Tool Annotation, allowing clients to advertise which MIME types they can render to Users and tokenize for LLM consumption. It also allows Tools to advertise the content types they may generate in a CallToolResult.
This enhancement works with the existing annotations system to optionally enable MCP Servers to adapt their content delivery to best match Host capabilities.
Motivation and Context
Different Host application/LLM pairs have different content handling requirements and capabilities (e.g. Chat Applications, IDEs, Video/Content Editing Suite, Agentic Applications).
This addition allows MCP Servers to make informed decisions to:
- Select optimal formats for content
- Use audience annotations more effectively
- Gracefully enhance or degrade based on Host needs/preferences.
Update 2025-04-19: The addition also enhances interoperability for implementors of the A2A protocol, which defines input and output modes for Agents. See AgentCard here and AgentSkill here.
How Has This Been Tested?
The extension has not been directly tested, however some example scenarios are:
- A Host application that can render but not tokenize audio/video can receive adapted Tool Results and Text Content for the LLM.
- An MCP Server can choose to return either a
application/pdfor downgrade totext/plainbased on LLM capabilities. - An MCP Server can provide additional instructions in the Tool Result to guide the User to obtain content that could not otherwise be rendered/processed.
- A client may choose not to include Tools that generate content types that cannot be handled.
Breaking Changes
The change is backwards compatible.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [X] Documentation update
Checklist
- [X] I have read the MCP Documentation
- [X] My code follows the repository's style guidelines
- [X] New and existing tests pass locally
- [X] I have added appropriate error handling
- [X] I have added or updated documentation as needed
Additional context
This is not intended to be a complicated content-type negotiation protocol - but to provide a simple way for participating Hosts and Servers to provide better User Experiences across a range of deployment scenarios. The list of mime-types is intended to be indicative and neither restrictive nor exhaustive.
An agreed convention for Resources where audience: [User], priority: 1 is used to indicate content that should be rendered and not tokenized would further enhance the proposal. For example a PDF could be sent for rendering, with the main content sent as text/plain for the LLM.
~~I do not think a reciprocal server capability is necessary, as "Roots" provide the ability for the Host to provide arbitrary content to the server.~~
Update 2025-04-19
~~After consideration, a Server "generates" capability is appropriate. By convention Servers that support "Structured Outputs" would advertise "application/json" in their generates list.~~
Update 2025-04-24
Migrated Server "generates" capability to a generatesHint in ToolAnnotation. By convention Servers that support "Structured Outputs" would advertise the content type (e.g. application/json or application/xml). This would be compatible with the potential addition of a Schema related to this tool.
This PR has been opened for discussion and refinement, with additional documentation to be prepared if there is agreement in principle.
A draft supplement is supplied below intended for inclusion in the documentation once the right place is identified if we progress with this PR.
ClientCapabilities contentTypes
renders[]: A non-exclusive list of MIME types intended to help Servers make informed decisions about content format selection. Servers may use this information along with audience annotations to target content appropriately. For example:["text/plain","image/png","video/mp4"]tokenizes[]: A non-exclusive list of MIME types to inform Servers what content types can be included in the LLM's context window. Servers may adapt their responses based on this such as providing alternative formats or using audience annotations to prioritise content appropriately. For example:["text/plain","image/png"]
~~## ServerCapabilities contentTypes~~
~~- generates[]: A non-exclusive list of MIME types to inform Hosts what content types may be generated by a Server in a CallToolResult. Servers that support Structured Output SHOULD advertise application/json here.~~
ToolAnnotation generateHints
generatesHint: A non-exclusive list of MIME types to inform Hosts what content types may be generated by a Server in aCallToolResult. Servers that support Structured Output SHOULD advertise the appropriate output (e.g.application/json) here.
Audience Annotations
For Resources annotated with audience=user, priority=1 the Host MAY choose not to present the full content of the Resource to the LLM.
ServerCapabilities contentTypes
generates[]: A non-exclusive list of MIME types to inform Hosts what content types may be generated by a Server in aCallToolResult. Servers that support Structured Output SHOULD advertiseapplication/jsonhere.
The structured output piece strikes me as the biggest issue here - are you imagining that structured output always flows through an embedded resource? If not, the TextContent type is notably missing a mime type property, so as near as I can tell there's no mechanism to actually communicate a mime type like application/json to a client. If you're imagining this always goes through an embedded resource unless the return is explicitly text/plain, and only text/plain maps to the TextContent, based on the fact that many MCP servers are using the text type to return json, this proposal would double the number of calls for a json heavy MCP server if they wanted to adhere exactly to spec.
I'd also think it would be useful to have prescriptive documentation around this - I know this is a philosophical discussion, but is a PDF an image content type with a mime type of pdf or, since it's a flat file format, that needs to be an embedded resource? In the negotiation scenario, are you saying the tool call will either return a text type for text/plain (again in the example you gave), an image type for application/pdf, or and embedded resource, say if they're returning something structured?
High level, I like this idea as a negotiation, but there might need to be some supporting changes to handle the structured output piece efficiently, and the tools documentation would need to be updated with a minimal amount of guidance for how servers should treat these content type requests, AND there should be documentation for clients on recommendations for what is expected of the client / host if they send in a renders mime type.
The structured output piece strikes me as the biggest issue here - are you imagining that structured output always flows through an embedded resource? If not, the TextContent type is notably missing a mime type property, so as near as I can tell there's no mechanism to actually communicate a mime type like application/json to a client. If you're imagining this always goes through an embedded resource unless the return is explicitly text/plain, and only text/plain maps to the TextContent, based on the fact that many MCP servers are using the text type to return json, this proposal would double the number of calls for a json heavy MCP server if they wanted to adhere exactly to spec.
The question on 371 was whether to use a TextResourceContents which has an optional MIME type and a uri. I would suggest continuing discussions on that aspect in #371, and consulting the schema for the data types under discussion.
I'd also think it would be useful to have prescriptive documentation around this - I know this is a philosophical discussion, but is a PDF an image content type with a mime type of pdf or, since it's a flat file format, that needs to be an embedded resource? In the negotiation scenario, are you saying the tool call will either return a text type for text/plain (again in the example you gave), an image type for application/pdf, or and embedded resource, say if they're returning something structured?
A PDF is a binary object that would be delivered as a "BlobResourceContents" with a MIME type of application/pdf. LLM Tokenization support for this varies hence Servers may choose to upgrade or downgrade based on known capabilities.
High level, I like this idea as a negotiation, but there might need to be some supporting changes to handle the structured output piece efficiently, and the tools documentation would need to be updated with a minimal amount of guidance for how servers should treat these content type requests, AND there should be documentation for clients on recommendations for what is expected of the client / host if they send in a renders mime type.
Questions on structured output specifically should be raised on #371. These are optional capabilities that Host and MCP Server Implementors can take advantage of to build enhanced applications.
TextContent type is notably missing a mime type property, so as near as I can tell there's no mechanism to actually communicate a mime type like application/json to a client
Per https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/180#discussioncomment-12614185, I believe TextContent will support mimeType in the future.
TextContent type is notably missing a mime type property, so as near as I can tell there's no mechanism to actually communicate a mime type like application/json to a client
Per #180 (comment), I believe
TextContentwill supportmimeTypein the future.
The suggestion was to use an EmbeddedResource of TextResourceContents type which contains a mimeType, a uri and text for the content. I think a mimeType on the TextContent itself would [potentially] be a good addition, but there is an alternative to TextContent.
Edited to say that mimeType on TextContent is potentially a good addition, my preference would be to fix it as text/plain though.
The question on 371 was whether to use a
TextResourceContentswhich has an optional MIME type and a uri. I would suggest continuing discussions on that aspect in #371, and consulting the schema for the data types under discussion.
Ok, then this is basically dependent on #371 going through? That should be called out in the PR.
I'd also think it would be useful to have prescriptive documentation around this - I know this is a philosophical discussion, but is a PDF an image content type with a mime type of pdf or, since it's a flat file format, that needs to be an embedded resource? In the negotiation scenario, are you saying the tool call will either return a text type for text/plain (again in the example you gave), an image type for application/pdf, or and embedded resource, say if they're returning something structured?
A PDF is a binary object that would be delivered as a "BlobResourceContents" with a MIME type of
application/pdf. LLM Tokenization support for this varies hence Servers may choose to upgrade or downgrade based on known capabilities.
This doesn't actually address the concern around documentation (though it nicely explains the thinking, but again, this is an intrinsic dependency on 371) - this PR has no documentation updates that state that a server SHOULD take these actions based on client supported mime types. It also gives no guidance what the order of precedence is - speaking as someone working on the server side, without documentation it would be unclear and spotty how servers should respond to different client mime type capabilities. Should render be preferred over tokenize or vice verse, or should you return two content types if the renders and tokenizes are different sets? This kind of sneaks back to your comments on the search PR suggesting giving prescriptive guidance to the client developers on how to handle the search capability, but that same sort of guidance is helpful on the server side for potentially contradictory client mime types.
This proposal has no dependency on #371 and predates it by 4 weeks.
This proposal has no dependency on #371 and predates it by 4 weeks.
Ah gotcha - then I guess the direct feedback is this PR should pickup the fields required to universally communicate back mime type on responses. Again, coming from the server side, I'm not sure how I'd support application/json or pdf (without looking at 371, which we're saying we shouldn't have to look at since it's not a dependency)
This proposal has no dependency on #371 and predates it by 4 weeks.
Ah gotcha - then I guess the direct feedback is this PR should pickup the fields required to universally communicate back mime type on responses. Again, coming from the server side, I'm not sure how I'd support application/json or pdf (without looking at 371, which we're saying we shouldn't have to look at since it's not a dependency)
The content types are already within the protocol, and well documented here: https://modelcontextprotocol.io/specification/2025-03-26/server/resources
The terminology on MAY, SHOULD and so on are defined here: https://modelcontextprotocol.io/specification/2025-03-26
To put the request for documentation in context, here's the guidance offered by the HTTP spec on content types - it's multiple pages and includes recommendations on defaults, recommendations on sniffing, how to handle unknown responses from both the server and client side, and more. And, it's worth mentioning, the HTTP server use case is actually quite a bit simpler in so much as an http server really only has the ability to return a single result to a request and only gets a single accept header from the requestor. This is as opposed to MCP servers which can potentially return multiple responses and have a much more complex matrix of considerations for what to return, and with this PR actually present two distinct accept lists.
This PR creates a similar mechanism within MCP but without any actual documented guidance on what should be returned by default in the absence of accepted content types, which accept list should be given precedence, if it's acceptable or preferable to return multiple content blocks if multiple mime types are accepted by the client, etc.
Regarding the comment on content types being well documented, this is from your PR:
``contentTypes | Advertises content types the Server may generate in a CallToolResult
I think this is what confused me, because as it's written here contentTypes only partially applies to CallToolResults (since mime is notably absent from the text response). So, maybe a slight rewording, OR pull an optional mime type onto the TextContent as part of this PR?
Actually, this all brings up an interesting question - right now, mime types are primarily on resources (plus image content and audio content) - would a server ever change the resources it presents to the client based on the accept lists?
To put the request for documentation in context, here's the guidance offered by the HTTP spec on content types - it's multiple pages and includes recommendations on defaults, recommendations on sniffing, how to handle unknown responses from both the server and client side, and more. And, it's worth mentioning, the HTTP server use case is actually quite a bit simpler in so much as an http server really only has the ability to return a single result to a request and only gets a single accept header from the requestor. This is as opposed to MCP servers which can potentially return multiple responses and have a much more complex matrix of considerations for what to return, and with this PR actually present two distinct accept lists.
This PR creates a similar mechanism within MCP but without any actual documented guidance on what should be returned by default in the absence of accepted content types, which accept list should be given precedence, if it's acceptable or preferable to return multiple content blocks if multiple mime types are accepted by the client, etc.
Regarding the comment on content types being well documented, this is from your PR:
``contentTypes
| Advertises content types the Server may generate in a CallToolResultI think this is what confused me, because as it's written here contentTypes only partially applies to CallToolResults (since mime is notably absent from the text response). So, maybe a slight rewording, OR pull an optional mime type onto the TextContent as part of this PR?
CallToolResult returns an array of content.
To put the request for documentation in context, here's the guidance offered by the HTTP spec on content types - it's multiple pages and includes recommendations on defaults, recommendations on sniffing, how to handle unknown responses from both the server and client side, and more. And, it's worth mentioning, the HTTP server use case is actually quite a bit simpler in so much as an http server really only has the ability to return a single result to a request and only gets a single accept header from the requestor. This is as opposed to MCP servers which can potentially return multiple responses and have a much more complex matrix of considerations for what to return, and with this PR actually present two distinct accept lists. This PR creates a similar mechanism within MCP but without any actual documented guidance on what should be returned by default in the absence of accepted content types, which accept list should be given precedence, if it's acceptable or preferable to return multiple content blocks if multiple mime types are accepted by the client, etc. Regarding the comment on content types being well documented, this is from your PR: ``contentTypes
| Advertises content types the Server may generate in a CallToolResultI think this is what confused me, because as it's written here contentTypes only partially applies to CallToolResults (since mime is notably absent from the text response). So, maybe a slight rewording, OR pull an optional mime type onto the TextContent as part of this PR?CallToolResult returns an array of content.
Yup, and my point is one of the array element options doesn't have a mime type.
Actually, this all brings up an interesting question - right now, mime types are primarily on resources (plus image content and audio content) - would a server ever change the resources it presents to the client based on the accept lists?
From the introduction to this PR:
This enhancement works with the existing annotations system to optionally enable MCP Servers to adapt their content delivery to best match Host capabilities.
"adapt their content delivery" means Servers adjusting the outputs of Prompts, Resources or Tools based on the content type hints from the Client. This is a good point to clarify for this discussion, thank you.
I've updated the comment in #371 to include an example CallToolResult and guidance text to make that clearer. Note this PR has been re-drafted and discussion moved to #356.
This is as opposed to MCP servers which can potentially return multiple responses and have a much more complex matrix of considerations for what to return, and with this PR actually present two distinct accept lists.
This PR is not proposing "accept lists", but optional content type hints. Since this PR is adding optional hints to the existing protocol, it may be more appropriate to start a separate discussion in the forums on that topic and whether MCP should contain that guidance.
As other points of discussion for this PR, I'd like to also get feedback on:
- ~~Whether
generatesshould be marked on individual Tools with an assumed default oftext/plain.~~ - There was earlier discussion on a potential
FileContenttype that may make sense for larger items. @jerome3o-anthropic - Similar to the above, whether any conventions should be applied to using
Rootsto transfer larger Resources (perhaps a specialization of FileContent). - Linking to discussion on New Content Type for "UI" - will comment over on that thread - my understanding is the protocol already supports their need but want to confirm understanding.
VS Code and many other clients allow users to change models on-the-fly, even in the same "chat session." With this proposal, if that happens, a client would need to stop and restart their MCP connection if they were to announce a different set of content types that they're able to tokenize. Since some servers can be stateful (e.g. playwright/puppeteer) this isn't something that can be done safely. I think we would need some way to announce a new (sub)set of capabilities to servers.
I understand. I might think about this the other way though - if the Client can match content consumption/generation (e.g. image/*) then it can inform the user about the risk of shifting to a model which is text only. Or, if using a text only model indicate to the User that may be restrictive.
In the current state of this PR, yes we might want to warn the user about the risk. But if there were a way to signal a change in capabilities, then it would 'just work' (given a well-implemented MCP server) and we wouldn't have to warn the user about anything 🙂
There's loads of scenarios, and I think another idea going around about exposing direct model information to the MCP Server. I guess we need to figure out the right level of abstraction for the Protocol. We already have mid-lifecycle capabilities change from the Server->Client (e.g. ToolListChangeNotification) so it doesn't seem out of the question.
VS Code and many other clients allow users to change models on-the-fly, even in the same "chat session." With this proposal, if that happens, a client would need to stop and restart their MCP connection if they were to announce a different set of content types that they're able to tokenize. Since some servers can be stateful (e.g. playwright/puppeteer) this isn't something that can be done safely. I think we would need some way to announce a new (sub)set of capabilities to servers.
Instead of specifying content types as a capability during the initialization phase, what about specifying them as a _meta parameter on each relevant JSON-RPC request (similar to an Accept header)? That would avoid the need to restart.
~~Maybe, though the MCP server may also want to change their prompts when the user's chat model changes.~~ It's more likely they would change the prompt results, rather than the name/arguments that is announced to clients
Instead of specifying content types as a capability during the initialization phase, what about specifying them as a
_metaparameter on each relevant JSON-RPC request (similar to anAcceptheader)? That would avoid the need to restart.
Follow-up thought: perhaps we should add contentTypes to ClientCapabilities, and then, instead of adding a _meta.contentTypes param, we add a _meta.capabilities param. That would support not only contentTypes, but also sampling, roots, etc.
I can't see the harm in Clients using the _meta field for that, the point is to advertise to the MCP Server what can be handled (not a guarantee that if it's sent it will be handled - it's a hint). Ultimately it's the Host apps choice whether to allow the User to select or to optimize model selection.
We have to be careful - at some point the abstractions between Client and Server get so leaky that MCP is an inhibitor rather than an interop enabler....!
This issue is kind of related. I wonder if this could be combined or also supported in some way: https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/604
I think there is a bigger issue here that is "content negotiation". @connor4312's point on changing requirements between tool calls is a good example. I defer this for now, but I have a strong suspicion that we want something different that is more akin to accept headers in HTTP for each request itself.
Here's an idea of how dynamic capabilities could be represented:
diff --git a/schema/draft/schema.ts b/schema/draft/schema.ts
index c688dc3..f9d66b5 100644
--- a/schema/draft/schema.ts
+++ b/schema/draft/schema.ts
@@ -200,9 +200,10 @@ export interface InitializedNotification extends Notification {
}
/**
- * Capabilities a client may support. Known capabilities are defined here, in this schema, but this is not a closed set: any client can define its own, additional capabilities.
+ * Part of {@link ClientCapabilities} which are sent during initialization and
+ * cannot be changed during the course of a session.
*/
-export interface ClientCapabilities {
+export interface StaticClientCapabilities {
/**
* Experimental, non-standard capabilities that the client supports.
*/
@@ -224,6 +225,13 @@ export interface ClientCapabilities {
* Present if the client supports elicitation from the server.
*/
elicitation?: object;
+}
+
+/**
+ * Part of {@link ClientCapabilities} which can be dynamically changed during
+ * the course of a session.
+ */
+export interface DynamicClientCapabilities {
/**
* Present if the client advertises content types it can handle.
*/
@@ -239,6 +247,12 @@ export interface ClientCapabilities {
};
}
+
+/**
+ * Capabilities a client may support. Known capabilities are defined here, in this schema, but this is not a closed set: any client can define its own, additional capabilities.
+ */
+export interface ClientCapabilities extends DynamicClientCapabilities, StaticClientCapabilities {}
+
/**
* Capabilities that a server may support. Known capabilities are defined here, in this schema, but this is not a closed set: any server can define its own, additional capabilities.
*/
@@ -1333,6 +1347,22 @@ export interface ElicitResult extends Result {
content?: { [key: string]: unknown };
}
+/**
+ * A notification from the client to the server, informing it that its capabilities
+ * have changed. This is typically used when the client has updated its underlying
+ * model or configuration.
+ */
+export interface ClientCapabilitiesChangedNotification extends Notification {
+ method: "notifications/client_capabilities/changed";
+ params: {
+ /**
+ * The new client capabilities that the client supports.
+ */
+ capabilities: DynamicClientCapabilities;
+ };
+}
+
+
/* Client messages */
export type ClientRequest =
| PingRequest
@@ -1353,7 +1383,8 @@ export type ClientNotification =
| CancelledNotification
| ProgressNotification
| InitializedNotification
- | RootsListChangedNotification;
+ | RootsListChangedNotification
+ | ClientCapabilitiesChangedNotification;
export type ClientResult = EmptyResult | CreateMessageResult | ListRootsResult | ElicitResult;
As MCP and my understanding of it as a client implementor has grown, I no longer think per-request headers are ideal. Namely due to sampling: sampling requests and responses will represent different content types and they can be emitted async, outside the lifecycle of any particular client request, so I think a push mechanism for the client to announce changed capabilities is preferable.
Tagging @kentcdodds and referring to https://github.com/modelcontextprotocol/modelcontextprotocol/pull/679
I agree with @connor4312 here and (as a server implementer) I think that it would be useful to go both ways as well (so the server can announce changed capabilities as well as the client).
In general, what I mean by #679 is that both clients and servers should communicate both what they can offer and what they can accept.
Before now I hadn't considered the fact that these capabilities could change over time and I'm not sure I completely understand the use case there, but I do think that the client and server should both be able to communicate their full capabilities.
I'm not sure I completely understand the use case there
VS Code and most other clients let you change the model you're using during a chat session. Or even change it autmatically depending on the query. Different models will have different sets of mimetypes they natively understand, and we don't want to have to restart MCP servers to announce updated capabilities when that happens.
That makes complete sense. I don't want to take things too far off the rails here, but I think the discussion over on this issue has a bearing on what could happen in this PR as well: https://github.com/modelcontextprotocol/modelcontextprotocol/pull/679#issuecomment-2971991645