claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

Support Tool Search and Programmatic Tool Use betas for reduced token consumption

Open matthewod11-stack opened this issue 3 weeks ago • 43 comments

Summary

Claude Code loads all tool definitions upfront at session start, which consumes significant context tokens - especially for users with multiple MCP servers, plugins, and agents configured. Anthropic has released beta features specifically designed to address this: Tool Search Tool and Programmatic Tool Calling.

These are documented at: https://www.anthropic.com/engineering/advanced-tool-use

Feature Request

Add support for the following API betas in Claude Code:

1. Tool Search Tool (tool-search-2025-04-15)

Allow tools to be marked with defer_loading: true so they remain discoverable without consuming context tokens at session start. Claude would discover relevant tools on-demand via a search mechanism.

Reported benefits:

  • 85% reduction in token usage while maintaining full tool access
  • Significant accuracy improvements (Opus 4: 49% → 74%, Opus 4.5: 79.5% → 88.1%)

2. Programmatic Tool Calling (programmatic-tool-use-2025-04-15)

Allow Claude to orchestrate multiple tools through code execution rather than individual API round-trips, with only final results entering context.

Reported benefits:

  • 37% token reduction on complex multi-tool tasks
  • Eliminates inference overhead from multiple round-trips

Use Case

Users with extensive setups (multiple MCP servers like filesystem, github, puppeteer, brave-search, plus plugins with agents/skills/commands) are paying a substantial token cost on every session. These betas would allow:

  1. MCP server tools to defer loading until actually needed
  2. Plugin-defined tools/agents to use deferred discovery
  3. Complex multi-tool workflows to execute more efficiently

Proposed Implementation

  • Add configuration options (perhaps in settings.json or .claude/settings.json) to enable these betas for users who want them
  • Support defer_loading flag in MCP server tool configurations
  • Support allowed_callers for programmatic tool execution

Additional Context

Users with API/developer platform accounts already have access to these betas when using the API directly - this would bring that capability to Claude Code.

matthewod11-stack avatar Dec 01 '25 23:12 matthewod11-stack

For the love of God, PLEASE!

dknoodle avatar Dec 03 '25 22:12 dknoodle

pls anthropic

vmihalis avatar Dec 04 '25 14:12 vmihalis

tool search with RAG sounds promising, but tf-idf still works. just implemented a search-inspect-query pattern to deal with 100+ tools/query endpoints.. tf-idf is lightweight!

https://github.com/gyorilab/indra_cogex/pull/249

ejmockler avatar Dec 04 '25 14:12 ejmockler

My most frequently-used MCP server has ~20-30 tools that consume 73.9k tokens of context. 53k of that is consumed by 3 tools that I never use. The authors are working on a fix, but the ability to configure defer_loading or even completely disable them (right now, disabling tools doesn't prevent them from being loaded into context) myself without waiting for the authors would be a lifesaver!

mpiroc avatar Dec 04 '25 21:12 mpiroc

If you want to filter unused tools inside your mcp server, there's always something like https://github.com/TBXark/mcp-proxy that works pretty well

michabbb avatar Dec 04 '25 23:12 michabbb

I am implementing this feature using MCP. https://github.com/pleaseai/mcp-gateway

amondnet avatar Dec 05 '25 00:12 amondnet

Also, if someone is interested in saving tokens: https://docs.docker.com/ai/mcp-catalog-and-toolkit/toolkit/#how-the-mcp-toolkit-works

michabbb avatar Dec 05 '25 01:12 michabbb

@mpiroc may I know tool search tool and programatic tool calling enabled in which Claude Code version? 2.0.36 work?

ysong2123 avatar Dec 05 '25 08:12 ysong2123

I'm really looking forward to it. I hope it will be supported in the Claude Code CLI as well.

nimto avatar Dec 06 '25 19:12 nimto

Hi everyone! We are testing a more token efficient way for users to connect MCP servers to Claude Code:

Overview MCP-CLI is an experimental approach to MCP tool calling that dramatically reduces token consumption in Claude Code. This means you can work with more tools and larger contexts without hitting limits, improving productivity across your development teams.

The Problem We're Solving Many power users rely heavily on MCP servers in their daily workflow. However, popular MCP servers often consume substantial tokens by loading complete tool definitions into the system prompt. This leads to:

  • Reduced effective context length
  • More frequent context compactions
  • Limitations on how many MCP servers you can run simultaneously

How MCP-CLI Works Instead of loading full tool definitions into the system prompt, MCP-CLI provides Claude with minimal metadata about each server and its tools. When Claude needs detailed information about a specific tool, it can request it on-demand through a separate set of commands. It then executes tool calls using MCP-CLI commands in the Bash tool. Key advantages:

  • On-demand tool information: Only consume tokens for tools actually relevant to each session
  • Programmatic output processing: Claude can pipe large outputs (like JSON responses) directly to files or process them with tools like jq, keeping bulky data out of context
  • Scale to more tools – Load more MCP servers without sacrificing context space

How to use: ENABLE_EXPERIMENTAL_MCP_CLI=true env var controls whether MCP-CLI is switched on for a current session. If you run into any limitations, you can always switch off the env var as desired.

Please test this out and share your feedback in this thread!

catherinewu avatar Dec 08 '25 21:12 catherinewu

@catherinewu WOW! Zero context for 21 tools!?!?!?!? Where can I send the Christmas gift???

Works great on bash easily. For Windows users, it works great in WSL, or add $env:ENABLE_EXPERIMENTAL_MCP_CLI = "true" to your PowerShell profile!

dknoodle avatar Dec 08 '25 22:12 dknoodle

@catherinewu may I know which Claude Code version enabled this setting? ENABLE_EXPERIMENTAL_MCP_CLI=true

ysong2123 avatar Dec 09 '25 03:12 ysong2123

@ysong2123 Claude Code version 2.0.56 and later have support for this env var. That said, we strongly recommend using the latest version 2.0.62 since that includes many bug fixes from the last few weeks.

catherinewu avatar Dec 09 '25 04:12 catherinewu

Real-World Use Case: MCP-Heavy Workflow Automation

I'm building SalesBriefAI, a B2B SaaS platform that orchestrates complex workflows across multiple systems. My Claude Desktop setup includes:

  • n8n-MCP server (~17 tools for workflow automation)
  • Supabase-MCP server (~15 tools for database operations)
  • Google Drive integration (2 tools for documentation)
  • Standard tools (web search, computer use, memory)

Total: ~43 tools consuming an estimated 38-46K tokens before any conversation starts.

Why Tool Search Tool Would Be Transformative

In practice, most conversations use only 3-5 tools:

  • Workflow maintenance → get_node, n8n_update_partial_workflow, n8n_get_workflow
  • Database work → execute_sql, list_tables, apply_migration
  • Documentation → google_drive_fetch, view

But I pay the full 40K+ token cost on every session, even when I'm just asking Claude to help me debug a single n8n expression.

Candidates for defer_loading: true in my setup:

  • Template tools (search_templates, get_template, n8n_deploy_template) — used maybe 5% of sessions
  • Branch management tools (create_branch, merge_branch, rebase_branch) — used rarely
  • Bulk operations (n8n_delete_workflow, n8n_workflow_versions) — situational
  • Diagnostics (get_logs, get_advisors) — on-demand only

An 85% token reduction would recover ~35K tokens for actual work—that's the difference between hitting context limits mid-task and completing complex multi-workflow operations.

Programmatic Tool Calling Use Case

I frequently need to audit patterns across 18+ n8n workflows:

  • "Find all workflows missing error handling on HTTP Request nodes"
  • "Check which workflows are using deprecated node versions"
  • "Audit email template usage across the system"

Currently, these tasks either exhaust my context window or require delegation to external tools. PTC would let Claude orchestrate n8n_list_workflowsn8n_get_workflow (×18) → analysis without intermediate JSON bloating context.

What I Built as a Workaround

To handle large-context analysis tasks, I've built a custom [Agent] Workflow Auditor—an n8n workflow that uses Gemini 2.5 Flash as an AI agent with 8 specialized helper workflows as tools:

Tool Purpose
Fetch Workflow Get complete workflow JSON by ID
List Workflows Enumerate all 79 n8n workflows
Execute SQL Run read-only database queries
Get App Config Retrieve JSONB configuration
Check RPC Function Verify database functions exist
List Supabase Tables Enumerate schema (27 tables)
List Email Templates List all 9 email templates
Get Node Version Check node type versions

Claude calls this agent via webhook when context exhaustion is imminent. Gemini handles the bulk analysis with its larger context window, then returns a synthesized report.

This works, but it's a significant engineering investment to solve what these native features would handle automatically. Native Tool Search Tool and PTC support would eliminate the need for this custom infrastructure entirely.

Request: Claude Desktop Support

I notice this issue focuses on Claude Code, but Claude Desktop with MCP servers is where I experience this pain most acutely. The same features would be equally valuable there:

  1. Tool Search Tool — Let MCP server configs specify defer_loading per tool
  2. Programmatic Tool Calling — Enable code-based orchestration for batch MCP operations

The architectural pattern is identical; it's just a different client surface.

Would love to see these features land in both Claude Code and Claude Desktop. Happy to provide more detailed telemetry or test beta implementations if helpful.


Stack details for context:

  • 27 n8n workflows (18 core + 1 agent + 8 helpers)
  • Supabase PostgreSQL with RPC functions and JSONB configuration
  • Token-optimized documentation system with 8 Agent Skills
  • Custom Workflow Auditor using Gemini for large-context delegation (workaround for native feature gap)

bvanorsdel avatar Dec 09 '25 05:12 bvanorsdel

@catherinewu It seems similar to the MCP+CLI method I implemented, but more convenient. Thank you.

amondnet avatar Dec 09 '25 05:12 amondnet

> What mcp tools do i have available? 

⏺ Bash(mcp-cli tools)
  ⎿  Error: Exit code 127
     (eval):1: command not found: mcp-cli

I didn't see an install command or package anywhere - am i missing something?

StreamlinedStartup avatar Dec 09 '25 06:12 StreamlinedStartup

Tip for Linux/Unix users: Check your actual shell!

If you've set ENABLE_EXPERIMENTAL_MCP_CLI=true but /context still shows high MCP token usage, you might be configuring the wrong shell.

The problem: Your terminal emulator (Konsole, iTerm, etc.) might use a different shell than you think. For example, I had configured ~/.zshrc but my Konsole profile was set to use fish - so the variable was never exported to Claude.

Quick diagnosis:

Find what shell is actually running Claude

ps -o ppid= -p $(pgrep -n claude) | xargs ps -o comm= -p

If it says fish but you configured zsh (or vice versa), that's your problem.

Setup per shell:

Fish (~/.config/fish/config.fish)

set -gx ENABLE_EXPERIMENTAL_MCP_CLI true

Bash (~/.bashrc) or Zsh (~/.zshrc)

export ENABLE_EXPERIMENTAL_MCP_CLI=true

Verify it's actually set before launching:

In your terminal, before running claude:

echo $ENABLE_EXPERIMENTAL_MCP_CLI

Should output: true

Or test without config changes:

ENABLE_EXPERIMENTAL_MCP_CLI=true claude

Hope this saves someone the debugging time!

bitr8 avatar Dec 09 '25 07:12 bitr8

@catherinewu I enabled this fantastic feature in my local. Thanks! One quick question -- ENABLE_EXPERIMENTAL_MCP_CLI was built on top of tool search tool & programmatic tool calling or a brand new feature? Any official instruction for it?

ysong2123 avatar Dec 09 '25 09:12 ysong2123

Does ENABLE_EXPERIMENTAL_MCP_CLI work with the agent sdk as well?

ShawnSack avatar Dec 11 '25 02:12 ShawnSack

@ShawnSack I verified, in my local, and I can see: "AssistantMessage(content=[ToolUseBlock(id='toolu_vrtx_01Ly3z5Q3WveqVCafc4uaJBT', name='Bash', input={'command': 'mcp-cli tools', 'description': 'List all available MCP tools'})], model='claude-haiku-4-5-20251001', parent_tool_use_id=None, error=None) "

mcp-cli tools has been invoked for sure! [I already set up ENABLE_EXPERIMENTAL_MCP_CLI in ~/.zshrc]

ysong2123 avatar Dec 11 '25 03:12 ysong2123

it seems the timeout option doesnt work on windows native version(with git for windows) after i connected to codex-cli mcp (calling codex cli to assist claude code) and send long code snippets(so the reponse should take time for a while), it says 'the socket connection was disconnected unexpectedly..' , i checked if it's from the mcp server itself but found it's from mcp-cli iteslf. so, i tried to use timeout flag but it's not working, the same error shows up again.

and on other terminal i don't use the ENABLE_EXPERIMENTAL_MCP_CLI flag and login to claude code then it shows all the mcp servers(including disabled ones globally) connected.

mysehyunhope avatar Dec 14 '25 03:12 mysehyunhope

Previously, the Atlassian, GitHub and Sentry MCPs I use take up a whopping 64.6k tokens. I then added export ENABLE_EXPERIMENTAL_MCP_CLI=true to my .bashrc and restarted Claude Code. The /context command now shows no token usage for MCPs at all.

Discovery and usage of a couple MCP tools worked flawlessly. mcp-cli servers correctly listed my available servers. Claude was able to deduce the correct tools on its first try when all I did was give it a couple of URLs corresponding to each of the services and told it to fetch the details. mcp-cli info worked nicely to check the correct schema, then mcp-cli call used them correctly and delivered the result back to me. Consumed context after all this was 16.2k tokens in messages - a lot less then what the MCPs initially used!

spawnia avatar Dec 15 '25 08:12 spawnia

@catherinewu found a bug with mcp-cli. Doesn't look to work when resuming existing sessions -

(new session)

> mcp-cli servers

⏺ Bash(mcp-cli servers)
  ⎿  (No content)

This creates .endpoint file with the current session ID - 644e1fb9-0791-48a0-bd24-4c73345a22a5.endpoint

When resuming with the same session ID above -

> mcp-cli servers

⏺ Bash(mcp-cli servers 2>&1)
  ⎿  Error: Exit code 1
     Error: Connection refused - is the MCP endpoint running?

This creates a new .endpoint file with a different ID - 37afe846-8e5e-46a6-b84c-f1e034fc7f58.endpoint. Assuming there is a bug where mcp-cli is not reusing the correct endpoint file.

prpatel05 avatar Dec 15 '25 16:12 prpatel05

Hi all - we added support for tool search behind an env var. To try it, please set ENABLE_TOOL_SEARCH=true and make sure that ENABLE_EXPERIMENTAL_MCP_CLI=false. Please let us know if you run into any issues!

catherinewu avatar Dec 15 '25 17:12 catherinewu

Hi all - we added support for tool search behind an env var. To try it, please set ENABLE_TOOL_SEARCH=true and make sure that ENABLE_EXPERIMENTAL_MCP_CLI=false. Please let us know if you run into any issues!

Hi @catherinewu, thanks! Tried out as of v2.0.70. Couple of things so far:

  1. It's using advanced-tool-use-2025-11-20 by default which is only Claude API / foundry compat: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool
  2. Using claude code with vertex, so tried forcibly switching the header to tool-search-tool-2025-10-19 underneath the hood, but that just results in the MCPSearch calls failing with 400 API errors like
API Error: 400 
    {"type":"error","error":{"type":"invalid_request_error","message":"Tool 
    reference 'mcp__slack__conversations_history' not found in available 
    tools"},"request_id":"req_vrtx_011CW9SWe7ZN1Rc9PMBEBzfT"}

The tool is definitely available

Is this not meant to be used with vertex / bedrock yet? Will stick to ENABLE_EXPERIMENTAL_MCP_CLI for now, ty

ts-shu avatar Dec 16 '25 00:12 ts-shu

Hi all - we added support for tool search behind an env var. To try it, please set ENABLE_TOOL_SEARCH=true and make sure that ENABLE_EXPERIMENTAL_MCP_CLI=false. Please let us know if you run into any issues!

From my understanding, both mcp cli and tool search tool are similar with defer loading. the Only difference is ENABLE_EXPERIMENTAL_MCP_CLI will proactively discover related tools from mcp servers and ENABLE_TOOL_SEARCH only can set and paired with defer_loading=true? If not, what's the difference?

ysong2123 avatar Dec 16 '25 05:12 ysong2123

Thanks for reporting! We're looking into the Bedrock/Vertex issues 🙏

catherinewu avatar Dec 16 '25 19:12 catherinewu

Thanks @catherinewu, MCP CLI is the feature I've been waiting for! I do have a bug report: I've been getting this error when trying to /compact since enabling the MCP CLI beta. I don't think it happens in every conversation, just some of them:

  Error: Error during compaction: Error: API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"tools: Tool names must be unique."},"request_id":"req_011CWBJErmCaM1TfeL1sJ4b9"}

mpiroc avatar Dec 17 '25 00:12 mpiroc

No compacting issues with latest version (2.0.71) and export ENABLE_EXPERIMENTAL_MCP_CLI=true.

cmin764 avatar Dec 17 '25 14:12 cmin764

Warning: Both ENABLE_TOOL_SEARCH and ENABLE_EXPERIMENTAL_MCP_CLI are set to true. These are mutually exclusive. Using Tool Search mode.

can someone please explain which key does what and how we should use them, i´m a bit confused after all that postings here 🙈 thanks 😏

michabbb avatar Dec 17 '25 15:12 michabbb