claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] /compact fails when context ~90% full due to max_tokens not being accounted for

Open lurenss opened this issue 3 months ago • 3 comments

Fix: Account for max_tokens in /compact context window calculation

Problem Description

The /compact command fails with an API error when the context window is around 90% full, even though it reports having ~10% available space. The error message is:

Error during compaction: Error: API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"input length and `max_tokens` exceed context limit: 198667 + 20000 > 200000, decrease input length or `max_tokens` and try again"},"request_id":"req_011CTTAxjzYcksbfD5ErTeTB"}

Root Cause Analysis

After analyzing the minified code in /Users/lurens/.claude/local/node_modules/@anthropic-ai/claude-code/cli.js, I identified the issue:

  1. The qb function (line 2211) returns the context window size: 200,000 for standard models
  2. The aj function calculates available context percentage but doesn't account for output tokens
  3. The compaction uses maxOutputTokensOverride: lx1 where lx1 = 20000 (line 2210)
  4. When checking if compaction is possible, the system only considers input tokens (~198,667) and reports ~10% available
  5. But the actual API call requires: input_tokens (198,667) + max_tokens (20,000) = 218,667 tokens total
  6. This exceeds the 200,000 limit, causing the API error

Proposed Fix

The aj function needs to account for the output tokens reserved for the response. Here's the conceptual fix:

Current Logic (Problematic)

function aj(A) {
    let B = H9B() - kN6;  // Total available minus buffer
    let Q = pd() ? B : H9B();  // Use auto-compact threshold if enabled
    let Z = Math.max(0, Math.round((Q - A) / Q * 100));  // Percent left
    // ... rest of calculation
}

Fixed Logic

function aj(A) {
    let B = H9B() - kN6;  // Total available minus buffer
    let Q = pd() ? B : H9B();  // Use auto-compact threshold if enabled

    // Account for max_tokens needed for compaction output
    let effectiveAvailable = Q - lx1;  // Subtract 20000 tokens for output
    let Z = Math.max(0, Math.round((effectiveAvailable - A) / effectiveAvailable * 100));

    // Adjust thresholds to account for output space
    let G = effectiveAvailable - _N6;  // Warning threshold
    let Y = effectiveAvailable - xN6;  // Error threshold
    let W = A >= G;
    let I = A >= Y;
    let J = pd() && A >= (B - lx1);  // Auto-compact threshold with output buffer

    return {
        percentLeft: Z,
        isAboveWarningThreshold: W,
        isAboveErrorThreshold: I,
        isAboveAutoCompactThreshold: J
    };
}

Alternative Solutions

  1. Reduce max_tokens for compaction: Change lx1 from 20,000 to 10,000
  2. Dynamic max_tokens: Calculate max_tokens based on available space
  3. More conservative thresholds: Increase the buffer values (kN6, _N6, xN6)

Impact

This fix ensures that:

  • The /compact command works reliably when context is ~90% full
  • The percentage calculation accurately reflects usable space
  • Users get proper warnings before hitting the limit

Testing

To verify the fix:

  1. Fill context to ~90% (shows ~10% available)
  2. Run /compact command
  3. Should complete successfully without API errors

Code References

  • lx1 = 20000 (line 2210) - max tokens for compaction
  • qb function (line 2211) - returns context window size
  • aj function - calculates available context percentage
  • H9B function - calculates total available tokens
  • px1 function - performs compaction with maxOutputTokensOverride: lx1

lurenss avatar Sep 24 '25 12:09 lurenss

Think the bug where the reserved memory is double counted is fixed now in version claude-sonnet-4-5-20250929 (Claude Code version 2.0.13). For details see closing comment in: https://github.com/anthropics/claude-code/issues/8914

halso avatar Oct 10 '25 20:10 halso

Found something interesting: Claude Code auto-reads .md files from ~/.claude/ AFTER you compact - meaning your instructions can survive compaction.

Tested with verification codes. After compact, Claude knew codes I never mentioned = proof files auto-loaded.

Impact: You can have persistent context that survives compaction, but only if you manage your file count strategically (found a 5-file limit).

Full investigation: https://gist.github.com/Kevthetech143/f6962aa451253f23aba49175fc49f366

@anthropics Is this intentional? Worth documenting?

Kevthetech143 avatar Nov 02 '25 16:11 Kevthetech143

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

github-actions[bot] avatar Dec 10 '25 10:12 github-actions[bot]