claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

Feature Request: Lazy Loading for MCP Servers and Tools (95% context reduction possible)

Open machjesusmoto opened this issue 4 months ago β€’ 22 comments

Feature Request: Lazy Loading for MCP Servers and Tools

Problem Statement

Currently, Claude Code loads all configured MCP servers, tools, and agents at session startup, consuming significant context before any conversation begins. In my environment:

  • MCP tools: 39.8k tokens (19.9%)
  • Custom agents: 9.7k tokens (4.9%)
  • System tools: 22.6k tokens (11.3%)
  • Memory files: 36.0k tokens (18.0%)
  • Total: ~108k tokens (54% of 200k limit)

This leaves only 92k tokens for actual conversation and work, severely limiting complex tasks.

Proposed Solution

Implement lazy loading for MCP servers and tools, loading them only when needed based on conversation context.

Core Features

  1. Lightweight Registry System

    • Load only a small index (~5k tokens) at startup
    • Registry contains tool names, descriptions, and trigger keywords
    • Tools load on-demand when keywords are detected
  2. Intelligent Loading

    • Analyze user input for relevant keywords
    • Load only required tools for the task
    • Cache loaded tools for session duration
    • Preload related tools that commonly work together
  3. Configuration Options

    {
      "optimization": {
        "lazyLoading": true,
        "maxInitialTokens": 5000,
        "autoLoadThreshold": 0.8,
        "cacheMinutes": 30
      },
      "mcpServers": {
        "example-server": {
          "command": "...",
          "lazyLoad": true,
          "triggers": ["keyword1", "keyword2"],
          "preloadWith": ["related-server"]
        }
      }
    }
    

Benefits

  • 95% Token Reduction: From 108k to ~5k initial tokens
  • Longer Conversations: 195k tokens available vs 92k currently
  • Better Performance: Faster startup, lower memory usage
  • Scalability: Can add more tools without context penalty

Implementation Suggestion

Phase 1: Basic Lazy Loading

  • Add lazyLoad flag to MCP server configs
  • Load registry instead of full tool definitions
  • Implement on-demand loading when tool is called

Phase 2: Intelligent Preloading

  • Keyword-based auto-loading
  • Pattern recognition for common workflows
  • Tool relationship mapping

Phase 3: Advanced Optimization

  • Session-based learning of tool usage patterns
  • Predictive preloading based on project context
  • Dynamic unloading of unused tools

User Experience

Before (Current)

Starting session...
Loading 73 MCP tools... [39.8k tokens]
Loading 56 agents... [9.7k tokens]
Loading system tools... [22.6k tokens]
Ready with 92k tokens remaining.

After (With Lazy Loading)

Starting session...
Loading tool registry... [5k tokens]
Ready with 195k tokens available.

User: "I need to build a React component"
> Auto-loading: context7, magic [+3.5k tokens]
> 191.5k tokens remaining

Test Case

I've already built a proof-of-concept in ~/.claude/optimization/ that demonstrates:

  • Registry generation from existing MCP configs
  • Keyword extraction and mapping
  • Lazy loader with pattern matching
  • 95% token reduction achieved

Files available for reference:

  • tool-registry.json - Example registry structure
  • lazy-loader.py - Proof of concept implementation
  • generate-index.py - Registry generation logic

Priority

High - This is a critical limitation for power users with many tools and complex workflows. The current 54% context consumption at startup makes many advanced use cases impossible.

Similar Products

  • VSCode: Lazy loads extensions based on file types and activation events
  • JetBrains IDEs: Load plugins on-demand based on project type
  • Vim/Neovim: Lazy loading plugins (lazy.nvim, vim-plug with lazy loading)

Contact

Happy to provide more details, test beta implementations, or share the proof-of-concept code.


Submitted by: dtaylor Date: July 9, 2025 Claude Code Version: [Current Version] Impact: Affects all users with multiple MCP servers configured

machjesusmoto avatar Sep 09 '25 07:09 machjesusmoto

Found 3 possible duplicate issues:

  1. https://github.com/anthropics/claude-code/issues/6638
  2. https://github.com/anthropics/claude-code/issues/7172
  3. https://github.com/anthropics/claude-code/issues/3036

This issue will be automatically closed as a duplicate in 3 days.

  • If your issue is a duplicate, please close it and πŸ‘ the existing issue instead
  • To prevent auto-closure, add a comment or πŸ‘Ž this comment

πŸ€– Generated with Claude Code

github-actions[bot] avatar Sep 09 '25 07:09 github-actions[bot]

I've created a proof-of-concept implementation demonstrating this lazy loading system:

πŸ”— Repository: https://github.com/machjesusmoto/claude-lazy-loading

The implementation shows:

  • βœ… 95% token reduction (from 108k to 5k)
  • βœ… Working registry generation from MCP configs
  • βœ… Intelligent keyword-based loading
  • βœ… Preload profiles for common workflows
  • βœ… Real-time token tracking

This could be integrated into Claude Code to dramatically improve the experience for power users with multiple MCP servers configured.

machjesusmoto avatar Sep 09 '25 07:09 machjesusmoto

@machjesusmoto How does it work? Does it change what mcps / tools are visible when the /mcp command is invoked?

lukemmtt avatar Sep 09 '25 15:09 lukemmtt

I think this is definitely a must to properly support MCP servers. I was thinking just giving defined Agents the ability to keep their own MCP server references (and being able to select only a sub-set of tools from any given MCP server). Then, for example, if I wanted to implement one of my Linear tasks it would call an Agent ("Implement Linear task KEY-123") with the MCP tools in its context for that specific task and not bloat the main thread with context.

wizardlyluke avatar Sep 10 '25 16:09 wizardlyluke

Thank you for the feedback! Let me address the questions:

@lukemmtt - How it works

The proof-of-concept creates a lightweight registry (~500 tokens) that replaces loading all MCP tools upfront:

  1. At startup: Only the registry loads (5k tokens vs 108k)
  2. During conversation: The system analyzes input for keywords
  3. On-demand loading: Only required tools are loaded when detected
  4. Caching: Loaded tools stay in memory for the session

Currently, it's a simulation showing what's possible. The /mcp command would still show all configured servers, but only the registry would consume tokens until tools are actually needed.

@wizardlyluke - Agent-specific MCP servers

That's a brilliant approach! Agent-scoped MCP servers would be even more efficient. My implementation could extend to support:

{
  "agents": {
    "linear-implementer": {
      "mcp_servers": ["linear", "github"],
      "auto_load": false
    }
  }
}

Regarding Duplicates

While related issues exist (#6638, #7172, #3036), this implementation offers:

  1. Working code: Not just a request, but functioning proof-of-concept
  2. 95% reduction achieved: Concrete metrics showing feasibility
  3. Keyword-based loading: Intelligent detection beyond manual control
  4. Registry approach: Minimal overhead solution that scales

The key differentiator is having working code that demonstrates the solution. This could help accelerate implementation by providing a reference.

Test It Yourself

git clone https://github.com/machjesusmoto/claude-lazy-loading.git
cd claude-lazy-loading
python3 optimization/lazy-loader.py stats

Happy to collaborate with others experiencing this issue to refine the approach!

machjesusmoto avatar Sep 10 '25 17:09 machjesusmoto

Regarding Duplicate Status

After reviewing the related issues, I believe this should remain open as it provides unique value:

Issue Comparison

#3036 - Reports the problem ("MCP servers eat context") #6638 - Requests dynamic loading/unloading
#7172 - Proposes token management improvements

#7336 (This) - Provides working implementation with:

  • πŸ“¦ Functional code: https://github.com/machjesusmoto/claude-lazy-loading
  • πŸ“Š Proven metrics: 95% reduction achieved
  • πŸ”§ Registry approach: Novel solution using lightweight index
  • 🎯 Keyword detection: Intelligent loading based on context

Why Keep Open

  1. Solution, not just problem: Others identify the issue; this provides a solution
  2. Reference implementation: Gives Claude team concrete code to work from
  3. Community testable: Others can validate the approach with their configs
  4. Different approach: Registry-based vs manual loading/unloading

I suggest we could:

  • Link these issues together as "related"
  • Keep this open as the "implementation reference"
  • Use others for requirements gathering

What do you think @lukemmtt @wizardlyluke? Should we consolidate discussion or keep the implementation separate?

machjesusmoto avatar Sep 10 '25 17:09 machjesusmoto

Seems like a neat implementation, I'll give it a shot this week.

I've personally been getting by with less refined approach: a custom-built script that I made that adds/removes mcps from the actual claude config files (using a separate json file to persistently store the mcp configs)β€”the obvious caveat being that I still need to restart the claude code session after each change for it to take effect; your solution sounds much more elegant.

As for consolidating the discussions or not, that's reasonable, but more importantly would be to just mention your solution in the other discussions to tie everything together, and maybe share it in the anthropic / claude / claude code subreddits for visibility, rather than wait for moderator intervention, which seems sparse in this noisy forum.

lukemmtt avatar Sep 10 '25 17:09 lukemmtt

@lukemmtt Thanks for trying it out! Your config-swapping approach is clever - it's actually complementary to what I built.

Current Limitations

You're absolutely right that this can't truly lazy-load without Claude Code native support. What it does:

  1. Shows what's possible: Demonstrates 95% reduction is achievable
  2. Simulates the behavior: Shows which tools would load for given inputs
  3. Provides the blueprint: Registry structure and loading logic ready for integration

Combining Approaches

Your script + this registry could work together:

# Use lazy-loader to analyze what's needed
python3 lazy-loader.py analyze "building React components today"
# Output: Would load context7, magic

# Use your script to update config with just those MCPs
your-script enable context7 magic

# Restart Claude with minimal MCPs loaded

Next Steps

Great idea about cross-posting! I'll:

  1. βœ… Comment on related issues (#3036, #6638, #7172) - Done!
  2. βœ… Post to r/ClaudeLLM and r/AnthropicClaude
  3. βœ… Share in Claude Discord if there's a channel for it

The real win would be getting this integrated natively. Until then, your config-swapping + my analysis tool might be the best workaround.

Would you be interested in combining efforts? We could:

  • Add config manipulation to the lazy-loader
  • Auto-generate minimal configs based on task analysis
  • Create a wrapper that restarts Claude with optimized MCPs

machjesusmoto avatar Sep 10 '25 19:09 machjesusmoto

lol... @machjesusmoto , you got me. Freeze all motor functions

lukemmtt avatar Sep 10 '25 19:09 lukemmtt

It is worth noting that a runtime MCP Toggle feature has been introduced in Claude Code 2.0.10:

2.0.10

  • Rewrote terminal renderer for buttery smooth UI
  • Enable/disable MCP servers by @mentioning, or in /mcp
  • Added tab completion for shell commands in bash mode
  • PreToolUse hooks can now modify tool inputs
  • Press Ctrl-G to edit your prompt in your system's configured text editor
  • Fixes for bash permission checks with environment variables in the command

Dynamically-loading MCPs is still an interesting concept, but the new ability to enable by "@mentioning" is almost equally valuable for my needs.

lukemmtt avatar Oct 10 '25 18:10 lukemmtt

It is worth noting that a runtime MCP Toggle feature has been introduced in Claude Code 2.0.10:

2.0.10

  • Rewrote terminal renderer for buttery smooth UI
  • Enable/disable MCP servers by @mentioning, or in /mcp
  • Added tab completion for shell commands in bash mode
  • PreToolUse hooks can now modify tool inputs
  • Press Ctrl-G to edit your prompt in your system's configured text editor
  • Fixes for bash permission checks with environment variables in the command

Dynamically-loading MCPs is still an interesting concept, but the new ability to enable by "@mentioning" is almost equally valuable for my needs.

Wouldn't this feature be the equivalent (though faster and not requiring Claude Code restart) of just removing/adding them manually? The true power of using many MCPs will be when you can get them added per defined agent, or allow the main agent to enable/disable them itself by knowing which MCPs it could have available.

wizardlyluke avatar Oct 12 '25 15:10 wizardlyluke

Yes exactly, I would like my main agent to not have any mcp servers and then have specialized research agents etc.. who have mcp severs and use them for example for more detailed research (github mcp, linear, context7) and then report back a condensed summary to the main agent. So the main session always stays clean

EugenEistrach avatar Oct 12 '25 19:10 EugenEistrach

2.0.10 made my work on machjesusmoto / mcp-toggle redundant, but then they delivered plugin extensibility in 2.0.12. A plugin system makes makes the project's overarching goal possible when it was previously only a theoretical PoC (machjesusmoto / claude-lazy-loading So, with a lot of caffeine and a little luck, I'll have functional "lazy loading & unloading with a registry being all that's held in context at launch" code to test soon.

machjesusmoto avatar Oct 12 '25 21:10 machjesusmoto

I'd like to second @EugenEistrach 's idea of being able to enable select MCPs for specific subagents, while leaving it disabled for the main agent. Currently if you disable it in the main agent, then you cannot enable it for any subagent. I understand that there is a security aspect here so subprocesses can't have great privileges than parent processes. I suppose a true lazy-load would work, but something deterministic would be ideal.

bbaran-tyler avatar Oct 21 '25 20:10 bbaran-tyler

https://www.anthropic.com/engineering/advanced-tool-use Hopefully support for this gets added to Claude Code soon 🀞

psjamesh avatar Nov 26 '25 11:11 psjamesh

Update: Anthropic Released an Official API Solution

On November 24, 2025, Anthropic released the Tool Search Tool beta feature that directly addresses this problem.

How It Works

Approach Token Usage
Traditional (all tools upfront) ~77K tokens
With Tool Search Tool ~8.7K tokens
Reduction 85%

Accuracy Improvements

  • Opus 4: 49% β†’ 74%
  • Opus 4.5: 79.5% β†’ 88.1%

API Usage

client.beta.messages.create(
    betas=["advanced-tool-use-2025-11-20"],
    tools=[
        {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
        {
            "name": "mcp__server__tool",
            "description": "...",
            "input_schema": {...},
            "defer_loading": True  # Not loaded until searched
        }
    ]
)

Proposed Claude Code Integration

Since this is now a first-party Anthropic feature, it would be great to see Claude Code CLI support via:

Option 1 - Global config:

{
  "betaFeatures": ["advanced-tool-use-2025-11-20"],
  "toolSearch": {
    "enabled": true,
    "defaultDeferLoading": true
  }
}

Option 2 - Per-server config:

{
  "mcpServers": {
    "my-server": {
      "command": "...",
      "deferLoading": true,
      "alwaysLoadTools": ["critical_tool"]
    }
  }
}

Option 3 - Auto-enable when MCP tool count exceeds a threshold (e.g., 20+ tools)

References

merlinrabens avatar Dec 01 '25 11:12 merlinrabens

+1 to see how and when claude code will implement this

grhaonan avatar Dec 04 '25 02:12 grhaonan

@ everyone try this: https://github.com/anthropics/claude-code/issues/12836#issuecomment-3629052941

Basically:

  1. Add the following to your shell file (~/.zshrc, ~/.bashrc, etc)
# Claude Code - Enable experimental MCP-CLI for reduced token consumption
export ENABLE_EXPERIMENTAL_MCP_CLI=true
  1. Make sure you have claude code version > 2.0.62.
  2. Enjoy what you see in /context 😊

merlinrabens avatar Dec 14 '25 09:12 merlinrabens

Running into the same issue and gald to see this work and active development, I would like to chip in an idea on adding a lightweight router model for tool selection for your consideration.

95% token reduction already is awesome, but strict adherence to predefined keywords might hinder some users. Thus -->

Proposed Enhancement: Phase 4 - Intelligent AI Router

The Problem with Static Keyword Matching

While keyword triggers work well for explicit mentions (e.g., "docker" β†’ load docker-mcp), they have limitations:

  • Implicit requests: "Show me what containers are running" (no "docker" keyword)
  • Multi-domain tasks: "Email the team about the database migration" (needs both gmail + database tools)
  • Ambiguous keywords: "push" could mean git, docker, notifications, etc.
  • Manual configuration: Requires users to define all triggers upfront

Solution: Two-Phase Architecture with Lightweight Model Router

Phase 1: Lightweight Router (Haiku/Fast Model)

User prompt β†’ Haiku analyzes intent
              ↓
         Tool catalog scan (name + description only)
              ↓
         Intelligent tool selection
              ↓
         Returns minimal tool set

Phase 2: Main Model (Sonnet/Opus)

User prompt + selected tools only β†’ Execute task

How It Works

1. At Session Start

// Haiku loads ultra-lightweight tool catalog (~2-3k tokens)
{
  "catalog": [
    {
      "server": "docker-mcp",
      "tool": "list-containers",
      "description": "List all Docker containers",
      "tags": ["docker", "containers", "processes"]
    },
    {
      "server": "google-workspace",
      "tool": "send_gmail_message",
      "description": "Send an email via Gmail",
      "tags": ["email", "gmail", "communication"]
    }
    // ... all tools as lightweight entries
  ]
}

2. On User Input

User: "Show me what containers are running and email the status to [email protected]"

Haiku Router:
1. Analyzes prompt semantics (not just keywords)
2. Identifies intents: [container_listing, email_sending]
3. Searches catalog for relevant tools
4. Returns: [docker-mcp::list-containers, google-workspace::send_gmail_message]

Main Model:
- Receives only 2 tool definitions (~1k tokens)
- Executes task with 98%+ context efficiency

3. Fallback Mechanism

If Router misses tools:
  β†’ Main model requests additional tools mid-conversation
  β†’ Router re-analyzes with conversation context
  β†’ Tools load on-demand
  β†’ No session restart needed

Integration with Your Registry System

Your POC already has the foundation! The enhancement would be:

{
  "optimization": {
    "lazyLoading": true,
    "loadingStrategy": "ai-router",  // NEW: "keywords" | "ai-router" | "hybrid"
    "routerModel": "haiku",          // NEW: Fast, cheap model for routing
    "fallbackToKeywords": true,      // NEW: Use keywords if router unavailable
    "maxInitialTokens": 3000,
    "cacheMinutes": 30
  },
  "mcpServers": {
    "docker-mcp": {
      "lazyLoad": true,
      "triggers": ["docker", "container"],  // Fallback keywords
      "semanticTags": ["containerization", "processes", "services"]  // NEW: For AI router
    }
  }
}

Implementation Phases

Phase 4.1: Basic AI Router

  • Haiku analyzes user prompt
  • Selects tools from catalog based on semantic understanding
  • Main model receives selected tools only

Phase 4.2: Context-Aware Routing

  • Router considers conversation history
  • Predictive preloading based on workflow patterns
  • Learning from tool usage (which tools actually get used)

Phase 4.3: Hybrid Intelligence

  • Combines AI routing + your keyword triggers
  • Falls back to keywords if AI router is uncertain
  • User can override with explicit @server mentions

Benefits Beyond Keyword Matching

Feature Keyword Matching AI Router
Explicit mentions βœ… Excellent βœ… Excellent
Implicit requests ❌ Misses βœ… Handles
Multi-domain tasks ⚠️ Partial βœ… Optimizes
Context awareness ❌ No βœ… Yes
Ambiguity resolution ❌ No βœ… Yes
Setup effort ⚠️ Manual triggers βœ… Zero config
Token overhead ~5k registry ~3k catalog

Real-World Example

Scenario: "Check if the deployment is healthy and notify the team"

Keyword Approach:

Triggers: deployment? notify? team?
β†’ Might load all of: kubernetes, docker, slack, email, github
β†’ 5-10k tokens for tools user might not need

Discussion Questions

  1. Would you consider such an AI router enhancement to your POC?
  2. If so, would you think of it as a plugin to your registry system or rather a integrated core feature?
  3. Any concerns about the additional Haiku API call overhead vs. token savings?

Technical Notes

Edge Cases to Handle:

  • Router uncertainty β†’ load broader tool set + use conversation context for refinement
  • Consider user overrides to limit tool set to consider by router model by limiting to specific mcp servers by name β†’ Explicit @server mentions limit to these tools considered by router model which then subsets to the tools used for main model call
  • Offline/API unavailable/model uncertainty what the best choice would be or lack of adequate tools among the user-subset mcp server(s) β†’ Fall back to keyword matching

Looking forward to hear if this might be of interest and add useful functionality to the keyword registry approach!

heuselm avatar Dec 22 '25 08:12 heuselm