[Feature Request] Make Agent Aware of token usage and cost
I need the claude-code agent to be aware of it's token usage and cost at any point in the workflow.
My primary goal is to include token use and cost in commit messages and session summary logs.
I want this. I know how long a context session is good for a certain task type, or how many tokens I think something might take. If Claude can be directed or advised on this in meaningful way this would be very useful.
Strong +1 for this feature - critical for agent development
I'm building UI/UX analysis agents and hit major blockers around token tracking that make this feature essential:
Current Blockers
1. Max/Pro Subscribers Have Zero Visibility
The /cost command explicitly excludes Max/Pro users with "don't worry about tokens" - meaning no data at all. This makes it impossible to:
- Optimize agent prompts based on actual consumption
- Compare efficiency between different agent implementations
- Budget token usage across multi-agent workflows
- Justify costs to stakeholders
2. Parallel Agent Execution Breaks Manual Tracking Claude Code encourages running agents in parallel for performance, but this makes before/after delta calculation impossible:
- Multiple agents executing simultaneously → can't attribute token usage
- Background processes consume tokens → creates noise in measurements
- Session-level totals only → no per-agent breakdown
3. No Programmatic Access for Agents Agents can't query their own consumption metrics. What's needed is a tool agents can call:
```python GetTokenUsage(scope="current_agent")
Returns:
{ "input_tokens": 1800, "output_tokens": 600, "cache_read_tokens": 450, "cache_creation_tokens": 200, "total": 3050, "model": "claude-sonnet-4-5-20250929" } ```
Real Use Case: UI/UX Analysis Agents
When analyzing multiple components, I need agents to self-report:
``` ✓ UI Analysis Agent completed
- Analyzed 12 components
- Found 8 accessibility issues
- Token usage: 2,400 tokens (~$0.03) • Input: 1,800 tokens • Output: 600 tokens • Cache savings: 450 tokens ```
This enables:
- Users choosing between fast/cheap vs. thorough/expensive analysis
- Developers optimizing agent efficiency over iterations
- Organizations justifying agent resource allocation
Why OpenTelemetry Doesn't Solve This
I saw #6925 was closed with "use OTel" - but that requires:
- Admin infrastructure setup (not self-service)
- No agent-level attribution (only session-level)
- External monitoring stack
- Still no visibility for Max/Pro users
Agents need self-service access to their own metrics - like how they can call `Read` or `Bash` without infrastructure setup.
Proposed Solution
Add `GetTokenUsage` as a built-in tool (similar to existing tools):
- Agent-scoped metrics - Track consumption per agent invocation
- Subtask attribution - When agents spawn sub-agents, track hierarchy (relates to #10164)
- Max/Pro support - Show token counts even if costs are hidden
- Real-time access - Agents query during/after execution
- Zero infrastructure - Works out of the box like other tools
Impact
Without this, building production-quality agents requires blind optimization. Token estimates based on input/output lengths are highly inaccurate with prompt caching, context windows, and multi-turn conversations.
As the agent ecosystem grows, this becomes increasingly critical for developers who need to justify resource allocation and optimize their implementations.