Add response limiting middleware
Enhancement
Let's add middleware that limits the response size of tool calls, either trimming the response if it's not a structured response or just raising a tool error
Claude finished @strawgate's task —— View job
Investigation Complete ✓
Recommendation
Implement a new ResponseLimitingMiddleware in src/fastmcp/server/middleware/response_limiting.py that intercepts tool call responses and enforces size limits. The middleware should:
- Check response size after tool execution
- For structured responses (with
output_schema): Raise aToolErrorwhen exceeding limit - For unstructured responses (text-only): Truncate to the specified limit with an indicator
- Configuration options: Max size (bytes), truncation suffix, specific tool filtering
This follows established patterns in FastMCP's existing middleware (LoggingMiddleware, ErrorHandlingMiddleware) and leverages the on_call_tool hook to target tool responses specifically.
Findings
Context
Issue #2004 requests middleware to limit response sizes from tool calls. This is particularly important for:
- Preventing context window overflow in LLM applications
- Managing memory usage for large responses
- Enforcing data transfer limits in production environments
Architecture Analysis
Existing Middleware Patterns:
FastMCP has a mature middleware system with several implementations:
- LoggingMiddleware (
logging.py): Intercepts all messages, estimates payload size/tokens - ErrorHandlingMiddleware (
error_handling.py): Transforms exceptions to MCP errors - TimingMiddleware (
timing.py): Tracks request duration - RateLimitingMiddleware (
rate_limiting.py): Enforces request frequency limits
All middleware inherit from the Middleware base class which provides hooks like:
on_message: All MCP messageson_call_tool: Specifically for tool execution (perfect for this use case)on_request: All requests expecting responses
Tool Response Structure:
Tool responses return a ToolResult object (src/fastmcp/tools/tool.py:66-103) which contains:
content: List ofContentBlockobjects (can includeTextContent,ImageContent, etc.)structured_content: Optional dict for tools withoutput_schemadefined
Error Handling:
FastMCP defines ToolError (src/fastmcp/exceptions.py:18) for tool operation errors. The ErrorHandlingMiddleware already demonstrates how to transform errors in middleware.
Size Estimation:
LoggingMiddleware already implements payload size estimation:
payload_length = len(payload) # Character count
payload_tokens = payload_length // 4 # Rough token estimate
Related Work
Issue #1844 - Response Caching Middleware:
- Mentions "max size of item for cache would be good too, maybe 1MB default"
- Draft PR #1845 implements caching with size limits
- Shows pattern for handling response size considerations
Issue #1854 - Common Utilities for Serializing/Sizing:
- Identifies need for consolidated utilities for estimating object size
- LoggingMiddleware already uses
pydantic_core.to_json()for serialization - Could potentially reuse this infrastructure
Issue #1988 - Call Tool Middleware:
- Discusses how exceptions propagate through middleware stack
- Relevant for understanding error handling in response limiting
PR #1805 - Refactor Logging Middleware:
- Consolidated logging implementations
- Shows best practices for middleware configuration
- Includes payload size/token estimation
Detailed Action Plan
Implementation Steps
1. Create Response Limiting Middleware
File: src/fastmcp/server/middleware/response_limiting.py
Class Structure:
class ResponseLimitingMiddleware(Middleware):
def __init__(
self,
*,
max_size: int = 1_000_000, # 1MB default
truncation_suffix: str = "\n\n[Response truncated due to size limit]",
tools: list[str] | None = None, # Filter specific tools
raise_on_structured: bool = True, # Raise error for structured responses
logger: logging.Logger | None = None,
):
...
async def on_call_tool(
self,
context: MiddlewareContext[mt.CallToolRequestParams],
call_next: CallNext[mt.CallToolRequestParams, ToolResult],
) -> ToolResult:
# 1. Execute tool via call_next
# 2. Serialize result to measure size
# 3. Check if size exceeds limit
# 4. Handle based on structured vs unstructured
# 5. Return modified or original result
...
Key Implementation Details:
-
Size Measurement:
- Use
pydantic_core.to_json()to serialize the ToolResult - Measure the byte length of serialized content
- This matches how LoggingMiddleware estimates size
- Use
-
Structured Response Handling:
- Check if
ToolResult.structured_contentis not None - If structured AND over limit: Raise
ToolErrorwith clear message - Rationale: Structured responses shouldn't be corrupted by truncation
- Check if
-
Unstructured Response Handling:
- Extract text from
TextContentblocks - Truncate to
max_size - len(truncation_suffix) - Append the truncation suffix
- Create new
ToolResultwith truncated content
- Extract text from
-
Tool Filtering:
- If
toolsparameter specified, only apply to those tools - Use
context.message.nameto check tool name
- If
2. Add Unit Tests
File: tests/server/middleware/test_response_limiting.py
Test Cases:
- Test structured response over limit raises
ToolError - Test unstructured response over limit gets truncated
- Test response under limit passes unchanged
- Test tool filtering (only specific tools limited)
- Test truncation suffix is applied correctly
- Test with different content types (text, images, etc.)
- Test integration with ErrorHandlingMiddleware
Example Test Structure:
async def test_truncate_large_unstructured_response():
mcp = FastMCP("test")
mcp.add_middleware(ResponseLimitingMiddleware(max_size=100))
@mcp.tool()
def large_response() -> str:
return "x" * 1000
async with Client(mcp) as client:
result = await client.call_tool("large_response", {})
# Assert truncation occurred
assert len(result.content[0].text) <= 100
assert "[Response truncated" in result.content[0].text
3. Update Documentation
File: docs/servers/middleware.mdx
Add section after existing middleware examples:
Content:
- Overview of response limiting use cases
- Configuration options
- Example: Basic usage
- Example: Different limits for different tools
- Example: Handling structured vs unstructured responses
- Best practices for setting size limits
4. Update Exports
File: src/fastmcp/server/middleware/__init__.py
Add:
from .response_limiting import ResponseLimitingMiddleware
5. Integration Points
Consider these integration scenarios:
-
With LoggingMiddleware:
- Log when truncation occurs
- Include original vs final size in logs
-
With ErrorHandlingMiddleware:
- Ensure ToolError for structured responses is handled properly
- Test error transformation and logging
-
With CachingMiddleware (future):
- Response size might affect caching decisions
- Could share size calculation utilities
Testing Strategy
- Unit Tests: Test middleware in isolation
- Integration Tests: Test with other middleware
- Performance Tests: Ensure minimal overhead for small responses
- Edge Cases: Empty responses, binary content, mixed content blocks
Alternative Considerations
Token-Based Limiting:
- Could use
length // 4like LoggingMiddleware - More relevant for LLM applications
- Could be a future enhancement
Streaming/Chunking:
- Could implement progressive size checking for streaming responses
- More complex but could prevent wasted computation
- Consider for future enhancement
Utility Function:
- Could extract size calculation to
src/fastmcp/utilities/per issue #1854 - Would benefit both this middleware and caching middleware
- Recommend doing this if implementing both features
Related Items
Related Issues
| Repository | Issue | Relevance |
|---|---|---|
| jlowin/fastmcp | Response Caching Middleware #1844 | Also needs max response size handling; mentions "1MB default" as a good limit. Draft PR #1845 implements this. |
| jlowin/fastmcp | Common utilities for serializing/deserializing and estimating size #1854 | Proposes consolidated utility functions for sizing MCP response types - could be leveraged here. |
| jlowin/fastmcp | Call tool middleware #1988 | Discusses exception propagation through middleware stack - relevant for error handling. |
| jlowin/fastmcp | Add request duration to logging middleware #1973 | Recent middleware enhancement showing active middleware development. |
Related Pull Requests
| Repository | PR | Relevance |
|---|---|---|
| jlowin/fastmcp | Add Response Caching Middleware #1845 | Draft PR implementing caching with max item size support - similar size limiting concerns. |
| jlowin/fastmcp | Refactor Logging and Structured Logging Middleware #1805 | Merged - shows best practices for middleware implementation, includes payload size estimation. |
| jlowin/fastmcp | Updates to Logging Middleware #1974 | Open - adds request duration tracking; shows ongoing middleware improvements. |
| jlowin/fastmcp | Internal refactor of MCP handlers #2005 | Open - refactors handler naming conventions; may affect middleware patterns. |
Related Files
| Repository | File | Relevance | Key Sections |
|---|---|---|---|
| jlowin/fastmcp | middleware.py | High - Base class and patterns | 80-196 - Middleware base class, on_call_tool hook |
| jlowin/fastmcp | logging.py | High - Size estimation pattern | 32-46 - Payload serialization, 69-77 - Size/token estimation |
| jlowin/fastmcp | error_handling.py | High - Error transformation | 78-108 - Exception handling and transformation |
| jlowin/fastmcp | tool.py | High - ToolResult structure | 66-103 - ToolResult class definition |
| jlowin/fastmcp | exceptions.py | Medium - Error types | 18-19 - ToolError definition |
| jlowin/fastmcp | middleware.mdx | High - Documentation structure | 1-100 - Middleware documentation patterns |
Such middleware could also be used for, e.g., additional logging.