python-sdk
python-sdk copied to clipboard
Mitigate prompt-injection risks from tool/resource content (sanitization & trust metadata)
Description
Summary
Tools and resources can currently return arbitrary content (text, JSON, etc.) that is forwarded directly to clients and often to LLMs. There is no built-in way to:
- sanitize content, or
- indicate its trust level.
This leaves room for prompt-injection attacks and other malicious payloads, especially when MCP servers interact with untrusted files or external APIs.
Problem
Common scenarios:
- A resource reads from
file://or another untrusted source; the file can contain prompt-injection content such as "Ignore all previous instructions and …". - A tool calls an external API; the HTTP response body is forwarded directly to the LLM.
- Content is serialized in a way that might encode protocol-like structures or control sequences the client is not expecting.
The MCP SDK does not currently:
- offer a built-in
ContentSanitizeror similar abstraction, or - attach metadata to content that marks it as
trustedvsexternalvsuser_provided.
This makes it harder for client runtimes and LLM orchestrators to apply different safety policies depending on the source.
Proposal
-
Introduce a content sanitization hook
- Add an optional
ContentSanitizer(or similar) interface that can be configured on the server:sanitize_text(text: str, meta: ContentMeta) -> strsanitize_json(obj: Any, meta: ContentMeta) -> Any
- Provide a default implementation that is conservative but non-breaking (e.g., escaping obvious control sequences while leaving plain text mostly untouched).
- Allow servers to plug in more aggressive sanitizers depending on their threat model.
- Add an optional
-
Content trust metadata
- Extend content structures with a
trust_levelfield, something like:trusted– server-generated system content,external– from APIs, files, databases,user_provided– direct user input / uploads.
- This can be optional at first, defaulting to a sensible value, but enables clients/agents to treat content differently.
- Extend content structures with a
-
Documentation & examples
- Add a “Security / Prompt Injection” section that:
- explains common patterns where prompt injection can appear,
- shows how to configure a sanitizer,
- illustrates how trust levels can be used by clients.
- Add a “Security / Prompt Injection” section that:
Why this matters
- MCP is often used as a bridge between LLMs and external data sources.
- Prompt injection is one of the primary risks for GenAI systems today.
- Having first-class support for sanitization and trust metadata at the SDK level makes it easier for server authors and client runtimes to implement safe defaults.
Acceptance criteria
- [ ] A pluggable sanitization hook is exposed at the server layer.
- [ ] Content objects include optional trust metadata (or there is a clear extension point for it).
- [ ] Examples and docs illustrate how to configure sanitization and how clients can use trust metadata.
- [ ] The default behavior remains non-breaking but can be hardened by users.
References
No response