[FEATURE] Agent Token Usage API - Critical Infrastructure for Agent Development Ecosystem
Problem Statement
Claude Code's agent ecosystem is growing rapidly, but agent developers are flying blind when it comes to resource consumption. Without programmatic access to token metrics, the community cannot:
- Build cost-aware agents that adapt their behavior based on budget
- Optimize agent implementations through data-driven iteration
- Provide users with transparency about resource consumption
- Make informed tradeoffs between speed, cost, and thoroughness
- Justify agent adoption to organizations evaluating Claude Code
This missing infrastructure is holding back the entire agent ecosystem.
Current Limitations
1. No Programmatic Access
Agents cannot query their own token consumption. There's no tool equivalent to Read or Bash for metrics access.
2. Max/Pro Subscriber Exclusion
The /cost command explicitly excludes Max/Pro users with:
> /cost
⎿ With your Claude Max subscription, no need to monitor cost — your subscription includes Claude Code usage
Zero data. Zero visibility. Zero metrics for development.
Even for developers who WANT to optimize their agents, there's no way to see consumption patterns.
3. Session-Level Only
Even when /cost works, it provides session totals with no per-agent attribution. Parallel agent execution makes manual tracking impossible.
4. No Sub-Agent Attribution
When agents spawn sub-agents (common in complex workflows), there's no way to understand the resource breakdown across the execution tree.
Impact on the Community
Agent Developers Can't Optimize
Without metrics, developers resort to:
- Estimating based on prompt length (highly inaccurate with caching)
- Manual token counting (doesn't account for thinking tokens, tool use overhead)
- Blind iteration hoping for improvements
Data-driven optimization is impossible.
Users Have No Transparency
When agents complete, users don't know:
- Which agents were resource-intensive
- Whether cheaper alternatives exist
- If they're getting value for token spend
- How to budget for agent workflows
Trust and adoption suffer.
Organizations Can't Justify Adoption
Enterprises evaluating Claude Code need:
- Cost attribution by department/project
- Per-agent efficiency metrics
- ROI calculations for automation
- Budget forecasting for agent usage
Without visibility, procurement stalls.
Real-World Use Cases
1. UI/UX Analysis Agents
Scenario: Analyzing accessibility across 50 components
Current state: No idea if agent consumed 10K or 100K tokens until session ends
With token API: Agent reports "Analyzed 12 components, 2,400 tokens (~$0.03), found 8 issues" - users can choose between fast/cheap scan vs. deep/expensive analysis
2. Code Review Agents
Scenario: Reviewing PR with 2,000 lines changed
Current state: Can't compare "quick review" vs "comprehensive review" agent implementations
With token API: Agents expose cost/benefit tradeoff - "Quick review: 5K tokens, 15 min" vs "Deep review: 25K tokens, 45 min, catches 3x more issues"
3. Documentation Generation Agents
Scenario: Auto-generating API documentation for 100 endpoints
Current state: No visibility into per-endpoint cost, can't optimize template design
With token API: Track "$0.02/endpoint, 800 tokens average" - identify expensive outliers, optimize templates, forecast costs for larger codebases
4. Testing Agents
Scenario: Generating unit tests for untested code
Current state: No way to measure efficiency gains from different prompting strategies
With token API: Compare "example-based prompting: 3K tokens/test" vs "reasoning-based: 8K tokens/test with 30% better coverage" - make informed architectural decisions
5. Refactoring Agents
Scenario: Automated code modernization across legacy codebase
Current state: Can't calculate ROI or justify agent usage to management
With token API: Report "Refactored 25 files, $2.40 total cost, saved 8 hours manual work" - clear value proposition
6. Research Agents
Scenario: Multi-source research aggregation
Current state: Don't know if web fetching or local analysis is more efficient
With token API: Measure "Web research: 15K tokens but fresher data" vs "Local analysis: 6K tokens from cached docs" - optimize strategy based on requirements
7. Multi-Agent Orchestration
Scenario: Chief architect coordinating researcher + planner + implementer
Current state: No breakdown showing which phase consumed resources
With token API: Display "Research: 12K tokens, Planning: 3K tokens, Implementation: 8K tokens" - optimize workflow by identifying bottlenecks
Proposed Solution
Add GetTokenUsage as a built-in tool agents can call (similar to Read, Bash, Grep):
# Agent calls during or after execution
GetTokenUsage(
scope="current_agent" # or "session", "subtask_tree"
)
# Returns real-time metrics
{
"agent_id": "ui-analyzer-instance-123",
"input_tokens": 1800,
"output_tokens": 600,
"cache_read_tokens": 450,
"cache_creation_tokens": 200,
"thinking_tokens": 300,
"total": 3350,
"model": "claude-sonnet-4-5-20250929",
"subtasks": [
{
"agent": "accessibility-checker",
"total": 1200
}
]
}
Key Requirements
- Self-service access - No infrastructure setup required (unlike OpenTelemetry)
- Agent-scoped metrics - Track per-agent and per-subtask consumption
- Real-time availability - Query during execution for adaptive behavior
- Max/Pro support - Show token counts even if dollar costs are hidden
- Hierarchical attribution - Track parent-agent → sub-agent relationships
- Zero config - Works out of the box like other built-in tools
Benefits to Claude Code Ecosystem
For Agent Developers
- Data-driven optimization through iteration metrics
- Cost-aware agents that adapt behavior based on budget constraints
- Better user experience through transparent resource reporting
- Faster debugging by identifying token-heavy operations
For End Users
- Transparency - Understand what agents are doing and what it costs
- Choice - Select between fast/cheap vs thorough/expensive options
- Trust - Agents that self-report build confidence
- Budget control - Make informed decisions about agent usage
For Organizations
- Cost attribution by team/project/workflow
- ROI calculations for automation investments
- Procurement justification with concrete efficiency metrics
- Optimization opportunities identified through usage patterns
For Anthropic/Claude Code
- Accelerated agent ecosystem growth - Remove blocker to serious agent development
- Competitive differentiation - First AI code assistant with agent-level metrics
- Community contribution quality - Better agents shared when developers can optimize
- Enterprise adoption - Visibility enables organizational buy-in
Why Existing Solutions Don't Work
/cost command:
- Session-level totals only (no agent attribution)
- Completely disabled for Max/Pro subscribers (see error above)
- Requires manual before/after calculation (broken by parallel execution)
- Not programmatically accessible to agents
OpenTelemetry integration:
- Requires admin infrastructure setup
- No agent-level attribution (session-level only)
- External monitoring stack dependency
- Still no visibility for Max/Pro users
- Not accessible from within agent code
Manual estimation:
- Wildly inaccurate with prompt caching
- Doesn't account for thinking tokens or tool overhead
- Can't predict multi-turn conversation costs
- No visibility into sub-agent spawning
Related Issues
- #10164 - Sub-agent token usage visibility (subset of this request)
- #777 - Agent awareness of token usage (original request, minimal detail)
- #6925 - Enterprise billback (closed with "use OTel" which doesn't solve agent-level access)
Conclusion
Token usage metrics are foundational infrastructure for the agent development ecosystem. Just as agents need Read to access files and Bash to run commands, they need GetTokenUsage to understand and optimize their resource consumption.
Without this capability:
- Developers build inefficient agents
- Users lack transparency and trust
- Organizations can't justify adoption
- The agent ecosystem remains hobbled
This feature would unlock a new level of sophistication in agent development, enabling cost-aware, self-optimizing, and transparent agents that users and organizations can confidently adopt.