opencode icon indicating copy to clipboard operation
opencode copied to clipboard

[FEATURE] Sliding window context management for long-running sessions

Open rickross opened this issue 1 month ago • 7 comments

Title: Sliding window context management for long-running sessions

The core insight

Current compaction amputates and recapitulates:

[old context] [recent work] → CHOP → [AI summary] [recent work]

We discovered a better approach - slide the window forward:

[inception] [old] [working context] → slide marker → [inception] [working context]

Instead of cutting off context and trying to recover with summaries, move the compaction marker forward through history while critical context travels with you. Like walking through time rather than repeatedly jumping off cliffs.

Why this matters

Current pain:

  • Sessions hit token limits and die
  • Compaction discards important context
  • AI-generated summaries are lossy and generic
  • Too much developer time is spent rebuilding context in new sessions
  • Starting fresh means re-explaining architecture, decisions, relevant files, and failed approaches

With sliding window:

  • Context slides forward continuously
  • Critical decisions preserved automatically
  • Working context stays intact
  • Zero rebuild time - just keep working
  • Multi-day sessions with maintained flow

Inception: Context that survives everything

Named after the film where ideas are planted deep enough to become foundational truths, inception messages are context so critical they must survive all compactions.

The concept: Just as the movie's characters planted ideas that became foundational, inception messages define:

  • Project architecture and core design decisions
  • Immutable constraints (security rules, API contracts, coding standards)
  • Critical discoveries and key requirements
  • Working preferences and non-negotiable rules

Why it matters: Without inception, long-running sessions force constant context rebuilding. You re-explain your architectural decisions, project constraints, and critical context after every compaction. Inception messages are planted once and become permanent bedrock - they travel with you through the entire session lifecycle.

Example:

Project architecture:
"This system uses event sourcing. ALL state changes must go through the event 
bus. Direct database writes are forbidden. This architectural decision is final."

Development constraints:
"When working on this codebase, always run tests before committing. Prefer 
functional patterns over OOP. Never modify files in /vendor/. These are 
non-negotiable."

Critical context:
"We're migrating from MongoDB to PostgreSQL. Any new features must use the new 
schema. The old system will be deprecated in Q2. This migration context must 
remain active throughout development."

These survive ALL compactions, ensuring continuity of project understanding.

Technical implementation:

  • Messages marked with preserve: true
  • Never pruned, regardless of age or token pressure
  • Slide forward with every compaction boundary
  • Form the continuous thread of project context

This is the foundation of long-running sessions - without inception, you're constantly rebuilding context instead of building on it.

The discovery: Chess-clock context relevance

Through months of long-running sessions, we discovered context relevance follows active working time, not wall-clock time.

The chess-clock concept: Imagine a chess timer that only runs during actual work:

  • Timer runs during active back-and-forth exchanges
  • Timer pauses during idle gaps (meetings, lunch, overnight, thinking pauses)
  • We measure "active conversation minutes" rather than wall-clock time

Why this works:

Example: 4-hour wall-clock session with 3-hour lunch break
- Wall-clock approach: "keep last 2 hours" → includes 2 hours of nothing
- Chess-clock approach: "keep 30 active minutes" → actual working conversation

Example: Rapid-fire debugging session
- 45 minutes of intense back-and-forth
- Chess clock: 45 active minutes (all relevant)
- Wall-clock: same, but can't distinguish from idle time

In practice:

auto_prune(
  keep_active_minutes=30,        # Keep 30 minutes of active conversation
  gap_threshold=60               # Gaps longer than 60 seconds pause the clock
)

The gap_threshold (in seconds) defines when the clock pauses. A 60-second gap pauses the timer - if you step away for lunch, that time doesn't count against your 30-minute window.

This preserves coherent working context while aggressively pruning old material.

How the sliding works

Traditional compaction (automatic amputation):

  • System finds the most recent compaction summary marker
  • Cuts everything before it indiscriminately
  • Generates AI summary to try recovering lost context
  • User has no control over what gets chopped

Our approach (deliberate high-water marking):

  • Nothing is deleted - all messages remain in storage
  • User examines session history and chooses a specific message as the cut point
  • Everything after that marker stays active in context
  • Messages marked with preserve: true (inception) slide forward with the window, regardless of age
  • We leverage OpenCode's existing compaction boundary - just controlling where it's placed

Example:

Instead of: "System found compaction at 10am, chopping everything before"
You get:    "I'll mark this message where we finalized the architecture as 
             the new baseline - everything after stays active, inception 
             messages come along, and nothing is lost from history"

Key insight: We don't change how compaction works - we just give users strategic control over the boundary while ensuring critical context travels forward. It's non-destructive context windowing.

Proposed mechanisms

1. Chess-clock auto-pruning Automatically maintains working context based on active conversation time, not wall-clock time.

2. Inception (permanent preservation) Mark critical messages that survive ALL compactions:

  • Architectural decisions
  • Project constraints
  • Key requirements
  • Important discoveries

3. Heuristic pruning (smart prioritization) Not everything is "critical forever" or "delete immediately" - the middle ground matters:

  • Assign priority levels 1-10 to messages
  • System makes smart decisions: "We're at 95% capacity, prune priority 3 and below"
  • Users set relative importance without micromanaging
  • More sophisticated than binary preserve/delete

Example use cases:

  • Priority 10: Inception messages (never prune)
  • Priority 7-9: Important context (prune only under pressure)
  • Priority 4-6: Useful but not critical (prune when approaching limits)
  • Priority 1-3: Low value (prune early)
  • Priority 0: Immediate removal (bloat, obsolete context)

4. Aggressive pruning of bloat Mark noise for immediate removal (priority 0):

  • Massive tool outputs (giant file reads, verbose npm installs)
  • Failed debugging attempts
  • Obsolete context

5. External management tool CLI tool for session management outside the active session:

  • Iterate through message history without consuming tokens
  • Analyze context consumption
  • Mark messages for preservation/pruning
  • Zero impact on active session
  • Fast iteration on session management

6. Interactive context viewer (TUI) Built-in visualization:

Session token usage: 187k/200k (93%)

Largest messages:
1. [45k] Tool: read massive-file.ts - 2h ago  [Priority: _] [Inception]
2. [32k] Tool: npm install output - 3h ago   [Priority: _] [Inception]
3. [28k] Text: Full analysis...    - 1h ago   [Priority: _] [Inception]

Inception messages: 3 (12k tokens)
Messages marked for pruning: 0
Potential savings if pruned: 105k tokens

Think htop for session context.

Real-world results

From months of production use:

  • 3-5x longer sessions (empirically measured)
  • Eliminate rebuild overhead (no more 30-minute context restoration)
  • Continuous flow across multiple days
  • Compound productivity - insights and context accumulate instead of resetting

Use cases

  • Multi-day feature development - preserve architectural context
  • Complex debugging - keep findings, prune failed attempts
  • Large codebase work - maintain project understanding across sessions
  • Long-running development with continuity

Addresses existing issues

  • #2945 - Session automatically compacted, destroying context
  • #3031 - Not enough context to continue after compaction
  • Related context-loss issues

Implementation status

This is not a proposal - it's a proven system.

We've been running this in production for months across multiple long-running sessions:

  • Full implementation as a working fork
  • Tested across 600k+ token sessions spanning days
  • Battle-tested tools: inception, preserve, prune, auto_prune, diagnose, repair
  • External CLI for zero-token session management
  • Empirically measured 3-5x session longevity improvements

What we're offering to contribute:

Core modifications - Type definitions and filtering logic for sliding window
ACM tools - Complete suite for preservation, pruning, and diagnosis
External management - CLI tool for inspecting/managing sessions without token cost
Chess-clock auto-pruning - Tested algorithm with configurable parameters
Heuristic pruning - Priority-based context management
Inception system - Permanent context preservation
Documentation - From months of real-world usage patterns

Code is ready. We use this daily. The question is whether the approach aligns with OpenCode's direction.

If interested, we can:

  1. Share the fork for evaluation
  2. Discuss design preferences before adapting for upstream
  3. Submit clean PR with tests and documentation
  4. Or maintain as fork if it doesn't fit OpenCode's vision

We're not proposing an idea - we're offering working code that solves real pain.

Questions for maintainers

  1. Does the sliding window approach align with OpenCode's vision?
  2. Should this be opt-in or automatic with user controls?
  3. Preferences on implementation:
    • Message-level vs part-level metadata?
    • Built-in TUI vs external tooling first?
  4. Interest in chess-clock auto-pruning?
  5. Value in heuristic pruning (priority levels 1-10)?
  6. Value in external management tool for zero-token session inspection?

rickross avatar Nov 23 '25 14:11 rickross

This issue might be a duplicate of existing issues. Please check:

  • #2945: Session automatically compacted, destroying the entire working context
  • #3031: Model in BUILD mode does not have enough context to continue after compaction
  • #3032: Soft compaction / AI global workspace metabolism
  • #3099: Agent no follow rules after compact session
  • #4317: Feature: generic /compact command, auto-compaction, and fork-aware conversations

Feel free to ignore if none of these address your specific case.

github-actions[bot] avatar Nov 23 '25 14:11 github-actions[bot]

holy. thats detailed.

seannetlife avatar Nov 23 '25 15:11 seannetlife

wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.

seannetlife avatar Nov 23 '25 15:11 seannetlife

wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.

We tried to figure out a way to do it without modifying core code, but it is necessary to modify the core compaction logic to provide the inception and sliding window features. We couldn't find any way around doing so.

rickross avatar Nov 23 '25 15:11 rickross

I used to do a similar thing with OpenWebUI, very basic though.

Keep the first few messages in the conversation to keep overall goal and then cull anything after that until the context window is within limits. Not really a compaction but more of a rolling cull. It did seem to work OK though.

This implementation is much more thorough and I can see it working really well - would be good to see this in action. Compaction right now is a massive pain point. I find the OpenCode implementation fairly mediocre and the Claude Code one actually not that bad... However, a summary compaction can only do so much!

SteveyBoros avatar Nov 24 '25 10:11 SteveyBoros

This seems like a great alternative to compaction as a summary message (as it is right now). Compaction right now also has the issue that custom commands are pruned away. For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?

fkukuck avatar Dec 09 '25 09:12 fkukuck

For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?

If you mark a message as "preserved" it survives compaction 100% intact. It is super easy to preserve messages and list preserved messages using the acm_preserve tool.

Also, fwiw, I don't think I have actually used /compact in months. I just run and run until my context is around 95% or more, then I tell my agent to "acm_prune 30" and it compacts away everything from more than 30-minutes ago (using the chess-clock time model.)

And there are ACM tools to map the context, to hunt for bloaty messages and to precision snipe them. Sometimes the culprit is just one long tool result, and acm_hunt + acm_snipe help you find and blast that kind of bloat very easily.

rickross avatar Dec 09 '25 13:12 rickross

This needs more love

SteveyBoros avatar Dec 10 '25 08:12 SteveyBoros

This needs more love

I probably would have packaged the whole thing as a giant PR, but the pace of releases of the opencode project is so rapid that I wouldn't know what to use as a baseline release. I just merge from the upstream code once or twice a week at this point, so I can have the latest opencode stuff. There's no way I could/would go back to not having the ACM (active context management) at this point!

We have also written a plugin that logs every turn of dialogue into a PostgreSQL database with full-text and vector searching. Using a simple cli tool we can now search the entire history of all the AI conversations, so we can restore context quickly on virtually any topic. ACM and this logging/search/recall capability have been serious game changers.

rickross avatar Dec 10 '25 12:12 rickross

Paging @rekram1-node and his opinion as this could potentially be a great feature that would differentiate OpenCode from similar tools such as Claude Code

fkukuck avatar Dec 10 '25 12:12 fkukuck