fastmcp Add response limiting middleware

Enhancement

Let's add middleware that limits the response size of tool calls, either trimming the response if it's not a structured response or just raising a tool error

Oct 04 '25 16:10 strawgate

Claude finished @strawgate's task —— View job

Investigation Complete ✓

Recommendation

Implement a new ResponseLimitingMiddleware in src/fastmcp/server/middleware/response_limiting.py that intercepts tool call responses and enforces size limits. The middleware should:

Check response size after tool execution
For structured responses (with output_schema): Raise a ToolError when exceeding limit
For unstructured responses (text-only): Truncate to the specified limit with an indicator
Configuration options: Max size (bytes), truncation suffix, specific tool filtering

This follows established patterns in FastMCP's existing middleware (LoggingMiddleware, ErrorHandlingMiddleware) and leverages the on_call_tool hook to target tool responses specifically.

Findings

Context

Issue #2004 requests middleware to limit response sizes from tool calls. This is particularly important for:

Preventing context window overflow in LLM applications
Managing memory usage for large responses
Enforcing data transfer limits in production environments

Architecture Analysis

Existing Middleware Patterns:

FastMCP has a mature middleware system with several implementations:

LoggingMiddleware (logging.py): Intercepts all messages, estimates payload size/tokens
ErrorHandlingMiddleware (error_handling.py): Transforms exceptions to MCP errors
TimingMiddleware (timing.py): Tracks request duration
RateLimitingMiddleware (rate_limiting.py): Enforces request frequency limits

All middleware inherit from the Middleware base class which provides hooks like:

on_message: All MCP messages
on_call_tool: Specifically for tool execution (perfect for this use case)
on_request: All requests expecting responses

Tool Response Structure:

Tool responses return a ToolResult object (src/fastmcp/tools/tool.py:66-103) which contains:

content: List of ContentBlock objects (can include TextContent, ImageContent, etc.)
structured_content: Optional dict for tools with output_schema defined

Error Handling:

FastMCP defines ToolError (src/fastmcp/exceptions.py:18) for tool operation errors. The ErrorHandlingMiddleware already demonstrates how to transform errors in middleware.

Size Estimation:

LoggingMiddleware already implements payload size estimation:

payload_length = len(payload)  # Character count
payload_tokens = payload_length // 4  # Rough token estimate

Related Work

Issue #1844 - Response Caching Middleware:

Mentions "max size of item for cache would be good too, maybe 1MB default"
Draft PR #1845 implements caching with size limits
Shows pattern for handling response size considerations

Issue #1854 - Common Utilities for Serializing/Sizing:

Identifies need for consolidated utilities for estimating object size
LoggingMiddleware already uses pydantic_core.to_json() for serialization
Could potentially reuse this infrastructure

Issue #1988 - Call Tool Middleware:

Discusses how exceptions propagate through middleware stack
Relevant for understanding error handling in response limiting

PR #1805 - Refactor Logging Middleware:

Consolidated logging implementations
Shows best practices for middleware configuration
Includes payload size/token estimation

Detailed Action Plan

Implementation Steps

1. Create Response Limiting Middleware

File: src/fastmcp/server/middleware/response_limiting.py

Class Structure:

class ResponseLimitingMiddleware(Middleware):
    def __init__(
        self,
        *,
        max_size: int = 1_000_000,  # 1MB default
        truncation_suffix: str = "\n\n[Response truncated due to size limit]",
        tools: list[str] | None = None,  # Filter specific tools
        raise_on_structured: bool = True,  # Raise error for structured responses
        logger: logging.Logger | None = None,
    ):
        ...
    
    async def on_call_tool(
        self,
        context: MiddlewareContext[mt.CallToolRequestParams],
        call_next: CallNext[mt.CallToolRequestParams, ToolResult],
    ) -> ToolResult:
        # 1. Execute tool via call_next
        # 2. Serialize result to measure size
        # 3. Check if size exceeds limit
        # 4. Handle based on structured vs unstructured
        # 5. Return modified or original result
        ...

Key Implementation Details:

Size Measurement:
- Use pydantic_core.to_json() to serialize the ToolResult
- Measure the byte length of serialized content
- This matches how LoggingMiddleware estimates size
Structured Response Handling:
- Check if ToolResult.structured_content is not None
- If structured AND over limit: Raise ToolError with clear message
- Rationale: Structured responses shouldn't be corrupted by truncation
Unstructured Response Handling:
- Extract text from TextContent blocks
- Truncate to max_size - len(truncation_suffix)
- Append the truncation suffix
- Create new ToolResult with truncated content
Tool Filtering:
- If tools parameter specified, only apply to those tools
- Use context.message.name to check tool name

2. Add Unit Tests

File: tests/server/middleware/test_response_limiting.py

Test Cases:

Test structured response over limit raises ToolError
Test unstructured response over limit gets truncated
Test response under limit passes unchanged
Test tool filtering (only specific tools limited)
Test truncation suffix is applied correctly
Test with different content types (text, images, etc.)
Test integration with ErrorHandlingMiddleware

Example Test Structure:

async def test_truncate_large_unstructured_response():
    mcp = FastMCP("test")
    mcp.add_middleware(ResponseLimitingMiddleware(max_size=100))
    
    @mcp.tool()
    def large_response() -> str:
        return "x" * 1000
    
    async with Client(mcp) as client:
        result = await client.call_tool("large_response", {})
        # Assert truncation occurred
        assert len(result.content[0].text) <= 100
        assert "[Response truncated" in result.content[0].text

3. Update Documentation

File: docs/servers/middleware.mdx

Add section after existing middleware examples:

Content:

Overview of response limiting use cases
Configuration options
Example: Basic usage
Example: Different limits for different tools
Example: Handling structured vs unstructured responses
Best practices for setting size limits

4. Update Exports

File: src/fastmcp/server/middleware/__init__.py

Add:

from .response_limiting import ResponseLimitingMiddleware

5. Integration Points

Consider these integration scenarios:

With LoggingMiddleware:
- Log when truncation occurs
- Include original vs final size in logs
With ErrorHandlingMiddleware:
- Ensure ToolError for structured responses is handled properly
- Test error transformation and logging
With CachingMiddleware (future):
- Response size might affect caching decisions
- Could share size calculation utilities

Testing Strategy

Unit Tests: Test middleware in isolation
Integration Tests: Test with other middleware
Performance Tests: Ensure minimal overhead for small responses
Edge Cases: Empty responses, binary content, mixed content blocks

Alternative Considerations

Token-Based Limiting:

Could use length // 4 like LoggingMiddleware
More relevant for LLM applications
Could be a future enhancement

Streaming/Chunking:

Could implement progressive size checking for streaming responses
More complex but could prevent wasted computation
Consider for future enhancement

Utility Function:

Could extract size calculation to src/fastmcp/utilities/ per issue #1854
Would benefit both this middleware and caching middleware
Recommend doing this if implementing both features

Related Issues

Repository	Issue	Relevance
jlowin/fastmcp	Response Caching Middleware #1844	Also needs max response size handling; mentions "1MB default" as a good limit. Draft PR #1845 implements this.
jlowin/fastmcp	Common utilities for serializing/deserializing and estimating size #1854	Proposes consolidated utility functions for sizing MCP response types - could be leveraged here.
jlowin/fastmcp	Call tool middleware #1988	Discusses exception propagation through middleware stack - relevant for error handling.
jlowin/fastmcp	Add request duration to logging middleware #1973	Recent middleware enhancement showing active middleware development.

Related Pull Requests

Repository	PR	Relevance
jlowin/fastmcp	Add Response Caching Middleware #1845	Draft PR implementing caching with max item size support - similar size limiting concerns.
jlowin/fastmcp	Refactor Logging and Structured Logging Middleware #1805	Merged - shows best practices for middleware implementation, includes payload size estimation.
jlowin/fastmcp	Updates to Logging Middleware #1974	Open - adds request duration tracking; shows ongoing middleware improvements.
jlowin/fastmcp	Internal refactor of MCP handlers #2005	Open - refactors handler naming conventions; may affect middleware patterns.

Related Files

Repository	File	Relevance	Key Sections
jlowin/fastmcp	middleware.py	High - Base class and patterns	80-196 - Middleware base class, `on_call_tool` hook
jlowin/fastmcp	logging.py	High - Size estimation pattern	32-46 - Payload serialization, 69-77 - Size/token estimation
jlowin/fastmcp	error_handling.py	High - Error transformation	78-108 - Exception handling and transformation
jlowin/fastmcp	tool.py	High - ToolResult structure	66-103 - ToolResult class definition
jlowin/fastmcp	exceptions.py	Medium - Error types	18-19 - ToolError definition
jlowin/fastmcp	middleware.mdx	High - Documentation structure	1-100 - Middleware documentation patterns

Oct 05 '25 01:10 marvin-context-protocol[bot]

Such middleware could also be used for, e.g., additional logging.

Oct 14 '25 06:10 jessealama