claude-agent-sdk-python icon indicating copy to clipboard operation
claude-agent-sdk-python copied to clipboard

Performance Issues with Server-side Multi-instance Deployment

Open konglinghai123 opened this issue 1 month ago • 1 comments

Performance Issues with Server-side Multi-instance Deployment

Summary

When deploying Claude Agent SDK in a server-side environment with multiple instances, we're experiencing significant performance bottlenecks that impact production use cases.

Issues

1. Slow Initialization for Each Instance

Every time we create a new SDK instance, the initialization process is extremely slow (20-30+ seconds based on related issues #2166, #3044). In a server environment handling multiple concurrent requests, this startup time is prohibitive.

Impact:

  • High latency for initial requests
  • Poor user experience
  • Resource waste waiting for initialization

2. Process Resource Consumption with Multiple Sessions

When managing multiple active sessions simultaneously, the resource consumption (memory, CPU) accumulates significantly, leading to:

  • Increased server costs
  • Potential memory leaks or performance degradation over time (similar to #10881)
  • Difficulty in horizontal scaling

3. No Way to Maintain "Warm" Instances

Currently, there's no official mechanism to:

  • Keep SDK processes in a "warm" state between requests
  • Reuse initialized instances across different sessions
  • Dynamically switch tool/working directories for a running instance

Feature Request: Dynamic Tool Directory Switching

To address these performance issues, we would like to request a feature that allows dynamic switching of tool directories for an already-initialized SDK instance.

Proposed Solution:

# Initialize once (warm instance)
client = ClaudeSDKClient()

# Dynamically switch working directory per session
client.set_tool_directory("/path/to/session1/tools")
response1 = client.send_message("Task for session 1")

# Reuse same instance for different session
client.set_tool_directory("/path/to/session2/tools")
response2 = client.send_message("Task for session 2")

Benefits:

  1. Instance Pooling: Create a pool of pre-warmed SDK instances that can be reused
  2. Reduced Latency: Eliminate 20-30s initialization delay for each request
  3. Better Resource Utilization: Maintain fewer processes while handling more sessions
  4. Improved Scalability: Enable efficient horizontal scaling in server environments

Current Workarounds Attempted

  • Creating instances on-demand: Too slow (20-30s per instance)
  • Keeping long-lived instances: Resource consumption becomes problematic
  • One instance per session: Not scalable for high-concurrency scenarios

Alternative Suggestions

If dynamic directory switching is not feasible, other solutions that would help:

  1. Significantly faster initialization (< 1s target)
  2. Built-in instance pooling/warm-up mechanism
  3. Lightweight "reset" method to reuse instances for different contexts
  4. Better process lifecycle management APIs

Use Case

Our server handles multiple users concurrently, each potentially starting new coding tasks. We need to:

  • Respond quickly to initial requests (< 2s target)
  • Maintain reasonable resource usage
  • Scale horizontally as user load increases

Currently, the SDK's architecture seems optimized for single-user CLI usage rather than multi-tenant server deployments.

Question for Maintainers

Is server-side multi-instance deployment a supported use case? If so, what are the recommended patterns for:

  • Instance lifecycle management
  • Resource optimization
  • Minimizing initialization overhead

Related Issues

  • #2166 - Initialization extremely slow
  • #3044 - SDK mode startup time
  • #10881 - Performance degradation in long sessions

Would appreciate any guidance or roadmap information on improving server-side deployment patterns. Thank you!

konglinghai123 avatar Nov 17 '25 06:11 konglinghai123

Indeed, it's too slow. Starting from scratch, from sending the first message to receiving a response, it takes 20 to 30 seconds, which is almost unacceptable.

tuywen avatar Nov 21 '25 08:11 tuywen

We are facing the same problem. It would be easier if ClaudeAgent was kept warm, and you can call agent.query and pass in a session-id / set of tools, which would help avoid the warmup.

Also, claude-agent-sdk seems to call out to Claude-code which in turns writes sessions to disk. The net result is that there is no easy way to start a conversation on one deployment / container, and continue with another container unless they both have access to the same file system.

gja avatar Dec 19 '25 09:12 gja

We’re seeing the same class of issues in our own project as well.

In my personal project efka (https://github.com/Harryoung/efka), which integrates Claude Agent SDK in a server-side, multi-user setting, the current Client Pool–based concurrency approach leads to excessive resource consumption (CPU + memory) as concurrency increases.

Keeping multiple warm clients to serve concurrent users quickly becomes expensive and hard to control, while creating clients on demand is too slow to be practical. In practice, this makes the SDK difficult to use for real-world multi-tenant services.

This feels like an architectural mismatch: the SDK works well for single-user or CLI-style workflows, but lacks a first-class, efficient multi-user concurrency model for server deployments.

We would strongly appreciate official support for a more elegant solution, such as:

  • lightweight session/context switching on a shared warm agent,
  • built-in pooling with proper resource isolation,
  • or other patterns designed explicitly for high-concurrency, multi-user servers.

Happy to share concrete benchmarks or implementation details from efka if that would help.

Harryoung avatar Dec 20 '25 13:12 Harryoung