python-sdk icon indicating copy to clipboard operation
python-sdk copied to clipboard

No default timeout for requests (unlike TS SDK)

Open ochafik opened this issue 3 months ago • 2 comments

Initial Checks

  • [x] I confirm that I'm using the latest version of MCP Python SDK
  • [x] I confirm that I searched for my issue in https://github.com/modelcontextprotocol/python-sdk/issues before opening this issue

Description

Spec states: Implementations SHOULD establish timeouts for all sent requests, to prevent hung connections and resource exhaustion.

The TypeScript SDK does have a default timeout of 60 seconds on requests, and the difference between Python & TypeScript is the crux of https://github.com/modelcontextprotocol/typescript-sdk/issues/245 (which also suggests 60 seconds might not be long enough). Also related, TypeScript has a resetTimeoutOnProgress option (defaulting to falst), which would probably be useful if we do introduce a default to the Python SDK.

Example Code


Python & MCP Python SDK

Latest

ochafik avatar Sep 17 '25 15:09 ochafik

https://github.com/modelcontextprotocol/python-sdk/pull/1159 adds timeout support

felixweinberger avatar Oct 07 '25 13:10 felixweinberger

Thanks for opening this — having sensible timeout behavior is critical for production MCP usage, especially when LLMs are orchestrating many tool calls.

From a broader SDK perspective, there are two related gaps:

  1. Server-side tool timeouts

    • Tool and resource handlers are awaited without any server-side timeout wrapper (e.g. anyio.fail_after()).
    • A single hung tool can effectively stall a session unless the client enforces its own timeout and tears down the connection.
    • For typical GenAI workflows, agents often retry or fan out tool calls, so one hung tool can easily multiply into many hung requests.
  2. Retry policy surfaced to clients

    • On the client side, we currently have read_timeout_seconds but no higher-level RetryPolicy abstraction that distinguishes between:
      • transient errors (network, transport issues),
      • server-side timeouts,
      • non-retryable errors (validation, permission, etc.).
    • Without a structured view of error types, LLM agents end up guessing when to retry.

Concretely, it might be useful to extend this issue (or create a follow-up) to cover:

  • A configurable server-side timeout for tool execution (with a reasonable default, e.g. 30s), returning a clear “timeout” error on the protocol layer.
  • A client-side RetryPolicy or equivalent hook that:
    • treats timeouts and transient transport issues as retryable (with backoff),
    • treats validation/semantic errors as non-retryable by default.

If maintainers are open to this direction, I’d be happy to help draft a more detailed proposal and/or work on a PR that wires in:

  • a per-tool execution timeout on the server, and
  • a minimal RetryPolicy on the client with a simple backoff strategy and clear error messages.

dgenio avatar Nov 28 '25 11:11 dgenio

This is a valid gap - we should match the TypeScript SDK's behavior here.

What the spec says:

"Implementations SHOULD establish timeouts for all sent requests, to prevent hung connections and resource exhaustion. When the request has not received a success or error response within the timeout period, the sender SHOULD issue a cancellation notification for that request and stop waiting for a response."

This is specifically about client-side request timeouts - how long the client waits for a response from the server.

TypeScript SDK:

  • Default DEFAULT_REQUEST_TIMEOUT_MSEC = 60000 (60 seconds)
  • Per-request timeout option
  • resetTimeoutOnProgress option

Python SDK currently:

  • Has read_timeout_seconds and request_read_timeout_seconds but no default
  • No resetTimeoutOnProgress equivalent

Proposed fix: Add a default timeout of 60 seconds to match TS SDK, with the ability to override per-request. Could also add progress-based timeout reset as a follow-up.

AI Disclaimer

maxisbey avatar Dec 03 '25 18:12 maxisbey