claude-code
claude-code copied to clipboard
[Feature Request] Support for MCP Sampling to leverage Claude Max subscriptions and reduce API costs
Support for MCP Sampling to leverage Claude Max subscriptions and reduce API costs
Problem
Currently, Claude Code, when acting as an MCP (Model Context Protocol) client, does not support (based on the feature support matrix from https://github.com/modelcontextprotocol/docs/blob/main/clients.mdx) the "sampling" feature of the MCP. This means that any MCP server connected to Claude Code, requiring an LLM inference, must make a direct API call to the Anthropic API.
While many Claude Code users benefit from the nearly unlimited usage provided by a Claude Max subscription, this direct API inference leads to additional "pay-as-you-go" costs for every server-initiated request. This creates an unexpected financial burden and complexity for users who assumed their Max subscription would cover the majority of their Claude usage, including server-side interactions.
Solution
Claude Code should be updated to fully support the MCP "sampling" feature: https://modelcontextprotocol.io/specification/2025-03-26/client/sampling
With sampling support, when an MCP server needs an LLM inference, it would send a sampling/createMessage request to the Claude Code client (the user's instance). This would allow the Claude Code client to:
- Utilize the user's existing Claude Max subscription: The inference request would be processed by the user's locally running Claude Code instance, falling under their subscription's usage limits, eliminating the need for the MCP server to manage separate API keys or directly configure access to the Anthropic API.
- Provide human-in-the-loop oversight (optional but beneficial): The user could review and approve/edit the prompt before it's sent to Claude, and review the completion before it's returned to the MCP server, leveraging Claude Code's existing interactive capabilities. This enhances transparency and control.
- Reduce pay-as-you-go API costs: This would eliminate the need for MCP servers to make direct API calls for inferences, significantly reducing unexpected costs for Claude Max subscribers.
Summary
The MCP "sampling" feature is designed precisely for scenarios where a server requires LLM assistance but wants to delegate the actual inference and associated costs/control to the client. Implementing this feature in Claude Code would align it more closely with the broader MCP philosophy and significantly enhance the value proposition for Claude Max subscribers who integrate Claude Code with custom MCP servers.
This would make Claude Code a more cost-effective and flexible solution for advanced agentic workflows.
Btw. VS Code team just shipped it https://github.com/microsoft/vscode/issues/244162/ https://x.com/kentcdodds/status/1931049244669915217
@mateuszmazurek We're looking into this! Can you share more about the server you're using or envisioning, and what use cases you're imagining?
@ashwin-ant I don't have any specific MCP server in mind. Let's be honest, almost none of them support sampling yet because virtually no sensible MCP client supports it, creating a bit of a Catch-22. However, I believe now that VS Code has implemented it, more and more servers will start using it since it's a win-win for users. I know that @eyaltoledano, the creator of Task Master, is very positive about this, which he mentioned in this comment: https://github.com/eyaltoledano/claude-task-master/issues/712#issuecomment-2954261468
Would love to see this implemented!
Using Claude-Code with claude-task-master for example, (an MCP server for task management), without sampling there's no choice but to configure the MCP server with external LLM keys.
Despite having a Claude Max subscription, this means I need to configure and pay for API access separately creating unnecessary friction and cost.
With sampling support, the MCP server could delegate these LLM requests back to Claude-Code - eliminated both the extra cost and configuration complexity (not to mention the implementation effort of the MCP server itself and those like it).
There's still the use case for MCPs to manage their own LLM connection, but full support of the MCP protocol seems like a no-brainer in this case!
This opens up a whole range of use cases for Claude-Code:
- MCP reads your code → asks Claude-Code to analyze complexity → returns actionable insights
- MCP detects a failing test → requests Claude-Code to suggest fixes → applies the best solution
- MCP monitors file changes → queries Claude-Code for impact analysis → updates related docs
- MCP receives user command → gets Claude-Code to plan approach → executes step by step
- MCP finds TODO comments → asks Claude-Code to expand them → creates implementation tasks
Each interaction leverages your existing Claude instance rather than requiring separate API calls.
Would love this, too. Shocked when I saw
Error: Error requesting sampling: session does not support sampling
@ashwin-ant
I would like to use Sampling in my TailwindPlus MCP server to ask the client's LLM to compare fragments of code between the user's project and the source components from the UI framework.
I'm hoping this can provide a "similarity assessment" not just a technical diff, because source components are often customized and drift from the original code. I could do this by sending the components to the LLM, and it could compare itself, but this would consume far too much context in the user's session; there are many components. Also, if using Sampling, I can control when it is used, from the MCP server-side.
Chiming in as well. I'd love to be able to use MCP sampling through Claude Code; there are some nontrivial second-order effects I'd like to derive with my MCP servers (e.g. define how to update some state based on an assessment) and it would be great to be able to do this in-band.
Just an FYI that Taskmaster supports consuming the Claude Code CLI (and your claude subscription) without the need for additional API keys.
Just run task-master models —setup and select sonnet / claude-code or tell the LLM to use that as your main model.
Just an FYI that Taskmaster supports consuming the Claude Code CLI (and your claude subscription) without the need for additional API keys.
Just run
task-master models —setupand select sonnet / claude-code or tell the LLM to use that as your main model.
What's the relation with the current issue???
@ashwin-ant I don't have any specific MCP server in mind. Let's be honest, almost none of them support sampling yet because virtually no sensible MCP client supports it, creating a bit of a Catch-22. However, I believe now that VS Code has implemented it, more and more servers will start using it since it's a win-win for users. I know that @eyaltoledano, the creator of Task Master, is very positive about this, which he mentioned in this comment: eyaltoledano/claude-task-master#712 (comment)
referring to this
sampling is not required if you’re specifically trying to use claude code subscription to power taskmaster
but sampling itself within claude would be awesome either way.
Echoing the request for sampling in Claude Code. I work on internal MCP servers for my job, and having sampling would be of huge benefit to me!
+1
+1
+1
+1
+1
+1
+1
+1 for this
The MCP I'm willing to use that needs MCP Sampling is: https://github.com/cameroncooke/XcodeBuildMCP
+1 - I have use cases for this, and I am using github copilot for something of these things because it supports it. I was kind of blown away since you guys are the originators of the MCP protocol. I figured you would be at the forefront. Elicitation would also be nice.
Just checking in on this feature. Ideally, I would like to use sampling for some complex tools. I have tested it out on other agents with amazing results and would really love to get it working with Claude Code. Any ETA?
+1
+1
I have a use case where I am working on an MCP server that maps interactions to template driven documentation to grow docs organically during the course of working sessions. These templates can be user defined and contain use_when: ... examples. I'd love to be able to map extracted insights to appropriate templates and was looking at leveraging MCP sampling to help with that classification. It seems really useful for things like this.
Also agree that Elicitation would be great as well - I don't have a current use case, but seems really useful.
+1
+1! MCP Sampling would make some of our use cases in MCP tools development much more straightforward!
A concrete use case from the Dart/Flutter MCP server is allowing a "bring your own model" approach to AI assistance features in your running Flutter or Dart application and the associated development tools. One of the biggest advantages of MCP itself is this openness and being model agnostic, it allows us to meet developers where they are.
Specifically this would enable tools such as "ask AI" buttons in the flutter widget inspector, network panel, etc, bringing AI assistance directly to the developer tools/emulators where the user is seeing the information/errors, instead of requiring them trying to describe to the AI agent the context that they want it to fetch (much of this context is available via MCP tools, but it requires a context switch and also the discoverability of these features is poor, agents in our experience don't tend to pro-actively use them a lot and require explicit prompting which requires user knowledge).