claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[FEATURE] Agent Token Usage API - Critical Infrastructure for Agent Development Ecosystem

Open ProgenyAlpha opened this issue 2 months ago • 3 comments

Problem Statement

Claude Code's agent ecosystem is growing rapidly, but agent developers are flying blind when it comes to resource consumption. Without programmatic access to token metrics, the community cannot:

  • Build cost-aware agents that adapt their behavior based on budget
  • Optimize agent implementations through data-driven iteration
  • Provide users with transparency about resource consumption
  • Make informed tradeoffs between speed, cost, and thoroughness
  • Justify agent adoption to organizations evaluating Claude Code

This missing infrastructure is holding back the entire agent ecosystem.

Current Limitations

1. No Programmatic Access

Agents cannot query their own token consumption. There's no tool equivalent to Read or Bash for metrics access.

2. Max/Pro Subscriber Exclusion

The /cost command explicitly excludes Max/Pro users with:

> /cost
  ⎿  With your Claude Max subscription, no need to monitor cost — your subscription includes Claude Code usage

Zero data. Zero visibility. Zero metrics for development.

Even for developers who WANT to optimize their agents, there's no way to see consumption patterns.

3. Session-Level Only

Even when /cost works, it provides session totals with no per-agent attribution. Parallel agent execution makes manual tracking impossible.

4. No Sub-Agent Attribution

When agents spawn sub-agents (common in complex workflows), there's no way to understand the resource breakdown across the execution tree.

Impact on the Community

Agent Developers Can't Optimize

Without metrics, developers resort to:

  • Estimating based on prompt length (highly inaccurate with caching)
  • Manual token counting (doesn't account for thinking tokens, tool use overhead)
  • Blind iteration hoping for improvements

Data-driven optimization is impossible.

Users Have No Transparency

When agents complete, users don't know:

  • Which agents were resource-intensive
  • Whether cheaper alternatives exist
  • If they're getting value for token spend
  • How to budget for agent workflows

Trust and adoption suffer.

Organizations Can't Justify Adoption

Enterprises evaluating Claude Code need:

  • Cost attribution by department/project
  • Per-agent efficiency metrics
  • ROI calculations for automation
  • Budget forecasting for agent usage

Without visibility, procurement stalls.

Real-World Use Cases

1. UI/UX Analysis Agents

Scenario: Analyzing accessibility across 50 components

Current state: No idea if agent consumed 10K or 100K tokens until session ends

With token API: Agent reports "Analyzed 12 components, 2,400 tokens (~$0.03), found 8 issues" - users can choose between fast/cheap scan vs. deep/expensive analysis

2. Code Review Agents

Scenario: Reviewing PR with 2,000 lines changed

Current state: Can't compare "quick review" vs "comprehensive review" agent implementations

With token API: Agents expose cost/benefit tradeoff - "Quick review: 5K tokens, 15 min" vs "Deep review: 25K tokens, 45 min, catches 3x more issues"

3. Documentation Generation Agents

Scenario: Auto-generating API documentation for 100 endpoints

Current state: No visibility into per-endpoint cost, can't optimize template design

With token API: Track "$0.02/endpoint, 800 tokens average" - identify expensive outliers, optimize templates, forecast costs for larger codebases

4. Testing Agents

Scenario: Generating unit tests for untested code

Current state: No way to measure efficiency gains from different prompting strategies

With token API: Compare "example-based prompting: 3K tokens/test" vs "reasoning-based: 8K tokens/test with 30% better coverage" - make informed architectural decisions

5. Refactoring Agents

Scenario: Automated code modernization across legacy codebase

Current state: Can't calculate ROI or justify agent usage to management

With token API: Report "Refactored 25 files, $2.40 total cost, saved 8 hours manual work" - clear value proposition

6. Research Agents

Scenario: Multi-source research aggregation

Current state: Don't know if web fetching or local analysis is more efficient

With token API: Measure "Web research: 15K tokens but fresher data" vs "Local analysis: 6K tokens from cached docs" - optimize strategy based on requirements

7. Multi-Agent Orchestration

Scenario: Chief architect coordinating researcher + planner + implementer

Current state: No breakdown showing which phase consumed resources

With token API: Display "Research: 12K tokens, Planning: 3K tokens, Implementation: 8K tokens" - optimize workflow by identifying bottlenecks

Proposed Solution

Add GetTokenUsage as a built-in tool agents can call (similar to Read, Bash, Grep):

# Agent calls during or after execution
GetTokenUsage(
    scope="current_agent"  # or "session", "subtask_tree"
)

# Returns real-time metrics
{
    "agent_id": "ui-analyzer-instance-123",
    "input_tokens": 1800,
    "output_tokens": 600,
    "cache_read_tokens": 450,
    "cache_creation_tokens": 200,
    "thinking_tokens": 300,
    "total": 3350,
    "model": "claude-sonnet-4-5-20250929",
    "subtasks": [
        {
            "agent": "accessibility-checker",
            "total": 1200
        }
    ]
}

Key Requirements

  1. Self-service access - No infrastructure setup required (unlike OpenTelemetry)
  2. Agent-scoped metrics - Track per-agent and per-subtask consumption
  3. Real-time availability - Query during execution for adaptive behavior
  4. Max/Pro support - Show token counts even if dollar costs are hidden
  5. Hierarchical attribution - Track parent-agent → sub-agent relationships
  6. Zero config - Works out of the box like other built-in tools

Benefits to Claude Code Ecosystem

For Agent Developers

  • Data-driven optimization through iteration metrics
  • Cost-aware agents that adapt behavior based on budget constraints
  • Better user experience through transparent resource reporting
  • Faster debugging by identifying token-heavy operations

For End Users

  • Transparency - Understand what agents are doing and what it costs
  • Choice - Select between fast/cheap vs thorough/expensive options
  • Trust - Agents that self-report build confidence
  • Budget control - Make informed decisions about agent usage

For Organizations

  • Cost attribution by team/project/workflow
  • ROI calculations for automation investments
  • Procurement justification with concrete efficiency metrics
  • Optimization opportunities identified through usage patterns

For Anthropic/Claude Code

  • Accelerated agent ecosystem growth - Remove blocker to serious agent development
  • Competitive differentiation - First AI code assistant with agent-level metrics
  • Community contribution quality - Better agents shared when developers can optimize
  • Enterprise adoption - Visibility enables organizational buy-in

Why Existing Solutions Don't Work

/cost command:

  • Session-level totals only (no agent attribution)
  • Completely disabled for Max/Pro subscribers (see error above)
  • Requires manual before/after calculation (broken by parallel execution)
  • Not programmatically accessible to agents

OpenTelemetry integration:

  • Requires admin infrastructure setup
  • No agent-level attribution (session-level only)
  • External monitoring stack dependency
  • Still no visibility for Max/Pro users
  • Not accessible from within agent code

Manual estimation:

  • Wildly inaccurate with prompt caching
  • Doesn't account for thinking tokens or tool overhead
  • Can't predict multi-turn conversation costs
  • No visibility into sub-agent spawning

Related Issues

  • #10164 - Sub-agent token usage visibility (subset of this request)
  • #777 - Agent awareness of token usage (original request, minimal detail)
  • #6925 - Enterprise billback (closed with "use OTel" which doesn't solve agent-level access)

Conclusion

Token usage metrics are foundational infrastructure for the agent development ecosystem. Just as agents need Read to access files and Bash to run commands, they need GetTokenUsage to understand and optimize their resource consumption.

Without this capability:

  • Developers build inefficient agents
  • Users lack transparency and trust
  • Organizations can't justify adoption
  • The agent ecosystem remains hobbled

This feature would unlock a new level of sophistication in agent development, enabling cost-aware, self-optimizing, and transparent agents that users and organizations can confidently adopt.

ProgenyAlpha avatar Oct 26 '25 19:10 ProgenyAlpha