python-sdk icon indicating copy to clipboard operation
python-sdk copied to clipboard

Add configurable concurrency limits and backpressure for MCP servers

Open dgenio opened this issue 1 month ago • 0 comments

Description

Summary

MCP servers built with the Python SDK currently process incoming requests as fast as they arrive, limited mostly by system resources. There are no built-in mechanisms to:

  • limit concurrent tool executions,
  • apply backpressure when the server is overloaded,
  • surface overload conditions in a structured way.

This is particularly important when LLMs or agents can generate many tool calls in parallel.

Problems

  • DoS / overload risk: A buggy or malicious client can issue hundreds or thousands of tool calls in parallel.
  • Resource exhaustion: Long-running tools can accumulate in flight, consuming memory and CPU.
  • No explicit overload signal: Clients have no way to know the server is overloaded beyond timeouts or generic errors.

Proposal

  1. Configurable concurrency limits

    • Add a setting (e.g. max_concurrent_tools) enforced by a semaphore in the server:
      • Only N tool executions are active at once.
      • Additional requests wait in a queue, up to a limit.
  2. Request queue and overload handling

    • Maintain a small queue of pending requests.
    • When the queue is full, reject new requests with a clear, structured error (e.g. “server overloaded” / 429-like semantics).
  3. Backpressure integration

    • Where transports support it (e.g. HTTP status codes), reflect overload in transport responses.
    • For other transports, return a well-defined MCP error code indicating overload.
  4. Configuration & docs

    • Expose configuration parameters with sensible defaults.
    • Document how to tune these settings for different deployment scenarios.

Why this matters

  • Robustness: Servers degrade gracefully under load instead of crashing or hanging.
  • Predictability: Clients and LLM agents can interpret overload errors and adjust behavior (e.g., backoff and retry).
  • Security: Basic protection against accidental or deliberate flooding.

Acceptance criteria

  • [ ] Server supports a configurable max_concurrent_tools limit.
  • [ ] Server supports a bounded queue for pending requests and rejects new ones when full.
  • [ ] Overload conditions are surfaced via a clear, documented error code.
  • [ ] Documentation describes how to configure and interpret these limits.

References

No response

dgenio avatar Nov 28 '25 15:11 dgenio