[FEATURE] Agent Token Usage API - Critical Infrastructure for Agent Development Ecosystem

Open ProgenyAlpha opened this issue 2 months ago • 3 comments

Problem Statement

Claude Code's agent ecosystem is growing rapidly, but agent developers are flying blind when it comes to resource consumption. Without programmatic access to token metrics, the community cannot:

Build cost-aware agents that adapt their behavior based on budget
Optimize agent implementations through data-driven iteration
Provide users with transparency about resource consumption
Make informed tradeoffs between speed, cost, and thoroughness
Justify agent adoption to organizations evaluating Claude Code

This missing infrastructure is holding back the entire agent ecosystem.

Current Limitations

1. No Programmatic Access

Agents cannot query their own token consumption. There's no tool equivalent to Read or Bash for metrics access.

2. Max/Pro Subscriber Exclusion

The /cost command explicitly excludes Max/Pro users with:

> /cost
  ⎿  With your Claude Max subscription, no need to monitor cost — your subscription includes Claude Code usage

Zero data. Zero visibility. Zero metrics for development.

Even for developers who WANT to optimize their agents, there's no way to see consumption patterns.

3. Session-Level Only

Even when /cost works, it provides session totals with no per-agent attribution. Parallel agent execution makes manual tracking impossible.

4. No Sub-Agent Attribution

When agents spawn sub-agents (common in complex workflows), there's no way to understand the resource breakdown across the execution tree.

Impact on the Community

Agent Developers Can't Optimize

Without metrics, developers resort to:

Estimating based on prompt length (highly inaccurate with caching)
Manual token counting (doesn't account for thinking tokens, tool use overhead)
Blind iteration hoping for improvements

Data-driven optimization is impossible.

Users Have No Transparency

When agents complete, users don't know:

Which agents were resource-intensive
Whether cheaper alternatives exist
If they're getting value for token spend
How to budget for agent workflows

Trust and adoption suffer.

Organizations Can't Justify Adoption

Enterprises evaluating Claude Code need:

Cost attribution by department/project
Per-agent efficiency metrics
ROI calculations for automation
Budget forecasting for agent usage

Without visibility, procurement stalls.

Real-World Use Cases

1. UI/UX Analysis Agents

Scenario: Analyzing accessibility across 50 components

Current state: No idea if agent consumed 10K or 100K tokens until session ends

With token API: Agent reports "Analyzed 12 components, 2,400 tokens (~$0.03), found 8 issues" - users can choose between fast/cheap scan vs. deep/expensive analysis

2. Code Review Agents

Scenario: Reviewing PR with 2,000 lines changed

Current state: Can't compare "quick review" vs "comprehensive review" agent implementations

With token API: Agents expose cost/benefit tradeoff - "Quick review: 5K tokens, 15 min" vs "Deep review: 25K tokens, 45 min, catches 3x more issues"

3. Documentation Generation Agents

Scenario: Auto-generating API documentation for 100 endpoints

Current state: No visibility into per-endpoint cost, can't optimize template design

With token API: Track "$0.02/endpoint, 800 tokens average" - identify expensive outliers, optimize templates, forecast costs for larger codebases

4. Testing Agents

Scenario: Generating unit tests for untested code

Current state: No way to measure efficiency gains from different prompting strategies

With token API: Compare "example-based prompting: 3K tokens/test" vs "reasoning-based: 8K tokens/test with 30% better coverage" - make informed architectural decisions

5. Refactoring Agents

Scenario: Automated code modernization across legacy codebase

Current state: Can't calculate ROI or justify agent usage to management

With token API: Report "Refactored 25 files, $2.40 total cost, saved 8 hours manual work" - clear value proposition

6. Research Agents

Scenario: Multi-source research aggregation

Current state: Don't know if web fetching or local analysis is more efficient

With token API: Measure "Web research: 15K tokens but fresher data" vs "Local analysis: 6K tokens from cached docs" - optimize strategy based on requirements

7. Multi-Agent Orchestration

Scenario: Chief architect coordinating researcher + planner + implementer

Current state: No breakdown showing which phase consumed resources

With token API: Display "Research: 12K tokens, Planning: 3K tokens, Implementation: 8K tokens" - optimize workflow by identifying bottlenecks

Proposed Solution

Add GetTokenUsage as a built-in tool agents can call (similar to Read, Bash, Grep):

# Agent calls during or after execution
GetTokenUsage(
    scope="current_agent"  # or "session", "subtask_tree"
)

# Returns real-time metrics
{
    "agent_id": "ui-analyzer-instance-123",
    "input_tokens": 1800,
    "output_tokens": 600,
    "cache_read_tokens": 450,
    "cache_creation_tokens": 200,
    "thinking_tokens": 300,
    "total": 3350,
    "model": "claude-sonnet-4-5-20250929",
    "subtasks": [
        {
            "agent": "accessibility-checker",
            "total": 1200
        }
    ]
}

Key Requirements

Self-service access - No infrastructure setup required (unlike OpenTelemetry)
Agent-scoped metrics - Track per-agent and per-subtask consumption
Real-time availability - Query during execution for adaptive behavior
Max/Pro support - Show token counts even if dollar costs are hidden
Hierarchical attribution - Track parent-agent → sub-agent relationships
Zero config - Works out of the box like other built-in tools

Benefits to Claude Code Ecosystem

For Agent Developers

Data-driven optimization through iteration metrics
Cost-aware agents that adapt behavior based on budget constraints
Better user experience through transparent resource reporting
Faster debugging by identifying token-heavy operations

For End Users

Transparency - Understand what agents are doing and what it costs
Choice - Select between fast/cheap vs thorough/expensive options
Trust - Agents that self-report build confidence
Budget control - Make informed decisions about agent usage

For Organizations

Cost attribution by team/project/workflow
ROI calculations for automation investments
Procurement justification with concrete efficiency metrics
Optimization opportunities identified through usage patterns

For Anthropic/Claude Code

Accelerated agent ecosystem growth - Remove blocker to serious agent development
Competitive differentiation - First AI code assistant with agent-level metrics
Community contribution quality - Better agents shared when developers can optimize
Enterprise adoption - Visibility enables organizational buy-in

Why Existing Solutions Don't Work

/cost command:

Session-level totals only (no agent attribution)
Completely disabled for Max/Pro subscribers (see error above)
Requires manual before/after calculation (broken by parallel execution)
Not programmatically accessible to agents

OpenTelemetry integration:

Requires admin infrastructure setup
No agent-level attribution (session-level only)
External monitoring stack dependency
Still no visibility for Max/Pro users
Not accessible from within agent code

Manual estimation:

Wildly inaccurate with prompt caching
Doesn't account for thinking tokens or tool overhead
Can't predict multi-turn conversation costs
No visibility into sub-agent spawning

Related Issues

#10164 - Sub-agent token usage visibility (subset of this request)
#777 - Agent awareness of token usage (original request, minimal detail)
#6925 - Enterprise billback (closed with "use OTel" which doesn't solve agent-level access)

Conclusion

Token usage metrics are foundational infrastructure for the agent development ecosystem. Just as agents need Read to access files and Bash to run commands, they need GetTokenUsage to understand and optimize their resource consumption.

Without this capability:

Developers build inefficient agents
Users lack transparency and trust
Organizations can't justify adoption
The agent ecosystem remains hobbled

This feature would unlock a new level of sophistication in agent development, enabling cost-aware, self-optimizing, and transparent agents that users and organizations can confidently adopt.

Oct 26 '25 19:10 ProgenyAlpha