[BUG] Token Limit Hard-Stop Without Warning or Auto-Compaction

Open smartwatermelon opened this issue 1 month ago • 4 comments

Preflight Checklist

[x] I have searched existing issues and this hasn't been reported yet
[x] This is a single bug report (please file separate reports for different bugs)
[x] I am using the latest version of Claude Code

What's Wrong?

Claude Code reaches the 200k token limit and abruptly stops mid-conversation without any proactive warnings or automatic compaction. The conversation hits a hard-stop displaying "Context limit reached · /compact or /clear to continue" with no prior indication that the limit was approaching.

Current problematic behavior:

No proactive warnings - Users receive no notification when approaching the token limit (e.g., at 80%, 90%, or 95% usage)
No automatic compaction - The conversation does not auto-compact to preserve recent context and continue working
Hard-stop mid-task - Work stops immediately, potentially in the middle of complex multi-step operations
Poor visibility - Token usage appears only in <system_warning> tags within tool results, not in user-facing output
Data loss risk - Users may not know to /export before the limit is reached
Workflow disruption - Requires manual intervention (/compact or /clear) to continue, losing momentum on active development tasks

This is a serious bug in Claude Code CLI's token management system that breaks the user experience during extended development sessions.

What Should Happen?

Claude Code should implement graceful degradation with progressive warnings and automatic compaction:

Progressive Warning System

At 80% token usage (160k/200k):

⚠️  Token usage: 160000/200000 (80%)
Consider running /compact to free up context, or save your work soon.

At 95% token usage (190k/200k):

⚠️  Token usage: 190000/200000 (95%) - Approaching limit!
Please run /compact now to continue working, or conversation will auto-compact in 30 seconds.

Automatic Compaction

At 98% token usage (196k/200k):

Automatically compact the conversation to preserve recent context
Display progress: "Auto-compacting conversation to free up tokens..."
Continue working seamlessly without user intervention
Preserve:
- Current task details
- Recent tool outputs
- User's original request
- Active file contents

Never Hard-Stop Mid-Conversation

The current hard-stop behavior should only occur as an absolute last resort, and if necessary, should happen gracefully at a task boundary (not mid-operation).

User Control

Provide settings to:

Configure warning thresholds (default: 80%, 95%)
Enable/disable auto-compaction (default: enabled)
Set auto-compaction trigger point (default: 98%)

Error Messages/Logs

### Current Hard-Stop Message


⎿  Context limit reached · /compact or /clear to continue

✻ Cogitated for 33m 34s


### Token Usage in System Warnings (not visible to users in normal output)


<system_warning>Token usage: 48278/200000; 151722 remaining</system_warning>
[... many interactions later ...]
⎿  Context limit reached · /compact or /clear to continue


**Key issue:** Token usage warnings appear only in XML tags within tool results, not in user-facing conversational output. Users have no visibility into approaching limits.

### No Error Logs

The hard-stop does not generate error logs or stack traces - it simply displays the "Context limit reached" message and halts all interaction.

Steps to Reproduce

Start a long development session with Claude Code CLI in any project:
```
claude-code
```
Engage in extended multi-step work that generates high token usage:
- Complex debugging sessions with multiple file reads
- Multi-file refactoring tasks
- Test generation workflows (multiple test files)
- Long implementation plans with review cycles
Continue working without manually checking token usage - rely on the tool to warn you
Observe: No warnings appear as token usage climbs (80%, 90%, 95%)

Observe: At ~200k tokens, conversation hits hard-stop:

⎿  Context limit reached · /compact or /clear to continue

Observe: No prior warnings were given in user-facing output
Observe: No automatic compaction occurred
Result: User must manually /export to save context, then /compact or /clear to continue

Minimal Reproduction Example

Since this requires reaching 200k tokens, here's an accelerated test scenario:

# 1. Start Claude Code
claude-code

# 2. Execute a token-heavy workflow (example)
# Ask Claude to read and analyze multiple large files repeatedly:
"Please read all TypeScript files in the src/ directory and provide detailed analysis of each"
"Now read all test files and compare them to the source files"
"Now generate comprehensive tests for each source file"
[Continue with similar requests until approaching 200k tokens]

# 3. Monitor token usage in system warnings (developer mode)
# Look for <system_warning>Token usage: X/200000; Y remaining</system_warning>

# 4. Observe no user-facing warnings at 80%, 90%, 95% thresholds

# 5. Observe hard-stop at 200k tokens with no auto-compaction

Environment Details

Claude Code Version: Latest as of 2026-01-16
Model: claude-sonnet-4-5-20250929
OS: macOS (Darwin 25.2.0)
Shell: GNU bash 5.3.9
Session Type: Interactive CLI development session

Contributing Factors

This issue is particularly critical for:

Power users with rigorous testing/review protocols (high token usage per task)
Complex multi-file refactoring tasks
Test generation workflows (multiple test files)
Long debugging sessions with extensive file exploration
Multi-step implementation plans with review cycles

Users following best practices (comprehensive testing, code review, documentation) naturally generate high token usage, making this bug a significant barrier to productive extended development sessions.

Additional Context

Workarounds (Current):

Users must manually:

Monitor token usage in <system_warning> tags (requires knowing where to look)
Periodically run /compact preemptively (disruptive to workflow)
Use /export before hitting limit to save context (requires anticipating the limit)
Run /clear to hard-reset after hitting limit (loses all context)

Priority: High - This affects usability during extended development sessions and can cause data loss. The fix would significantly improve the user experience for power users working on complex tasks.

Claude Model

Sonnet (default)

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.9

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

iTerm2

Additional Information

No response

Jan 17 '26 00:01 smartwatermelon