opencode [FEATURE] Sliding window context management for long-running sessions

Title: Sliding window context management for long-running sessions

The core insight

Current compaction amputates and recapitulates:

[old context] [recent work] → CHOP → [AI summary] [recent work]

We discovered a better approach - slide the window forward:

[inception] [old] [working context] → slide marker → [inception] [working context]

Instead of cutting off context and trying to recover with summaries, move the compaction marker forward through history while critical context travels with you. Like walking through time rather than repeatedly jumping off cliffs.

Why this matters

Current pain:

Sessions hit token limits and die
Compaction discards important context
AI-generated summaries are lossy and generic
Too much developer time is spent rebuilding context in new sessions
Starting fresh means re-explaining architecture, decisions, relevant files, and failed approaches

With sliding window:

Context slides forward continuously
Critical decisions preserved automatically
Working context stays intact
Zero rebuild time - just keep working
Multi-day sessions with maintained flow

Inception: Context that survives everything

Named after the film where ideas are planted deep enough to become foundational truths, inception messages are context so critical they must survive all compactions.

The concept: Just as the movie's characters planted ideas that became foundational, inception messages define:

Project architecture and core design decisions
Immutable constraints (security rules, API contracts, coding standards)
Critical discoveries and key requirements
Working preferences and non-negotiable rules

Why it matters: Without inception, long-running sessions force constant context rebuilding. You re-explain your architectural decisions, project constraints, and critical context after every compaction. Inception messages are planted once and become permanent bedrock - they travel with you through the entire session lifecycle.

Example:

Project architecture:
"This system uses event sourcing. ALL state changes must go through the event 
bus. Direct database writes are forbidden. This architectural decision is final."

Development constraints:
"When working on this codebase, always run tests before committing. Prefer 
functional patterns over OOP. Never modify files in /vendor/. These are 
non-negotiable."

Critical context:
"We're migrating from MongoDB to PostgreSQL. Any new features must use the new 
schema. The old system will be deprecated in Q2. This migration context must 
remain active throughout development."

These survive ALL compactions, ensuring continuity of project understanding.

Technical implementation:

Messages marked with preserve: true
Never pruned, regardless of age or token pressure
Slide forward with every compaction boundary
Form the continuous thread of project context

This is the foundation of long-running sessions - without inception, you're constantly rebuilding context instead of building on it.

The discovery: Chess-clock context relevance

Through months of long-running sessions, we discovered context relevance follows active working time, not wall-clock time.

The chess-clock concept: Imagine a chess timer that only runs during actual work:

Timer runs during active back-and-forth exchanges
Timer pauses during idle gaps (meetings, lunch, overnight, thinking pauses)
We measure "active conversation minutes" rather than wall-clock time

Why this works:

Example: 4-hour wall-clock session with 3-hour lunch break
- Wall-clock approach: "keep last 2 hours" → includes 2 hours of nothing
- Chess-clock approach: "keep 30 active minutes" → actual working conversation

Example: Rapid-fire debugging session
- 45 minutes of intense back-and-forth
- Chess clock: 45 active minutes (all relevant)
- Wall-clock: same, but can't distinguish from idle time

In practice:

auto_prune(
  keep_active_minutes=30,        # Keep 30 minutes of active conversation
  gap_threshold=60               # Gaps longer than 60 seconds pause the clock
)

The gap_threshold (in seconds) defines when the clock pauses. A 60-second gap pauses the timer - if you step away for lunch, that time doesn't count against your 30-minute window.

This preserves coherent working context while aggressively pruning old material.

How the sliding works

Traditional compaction (automatic amputation):

System finds the most recent compaction summary marker
Cuts everything before it indiscriminately
Generates AI summary to try recovering lost context
User has no control over what gets chopped

Our approach (deliberate high-water marking):

Nothing is deleted - all messages remain in storage
User examines session history and chooses a specific message as the cut point
Everything after that marker stays active in context
Messages marked with preserve: true (inception) slide forward with the window, regardless of age
We leverage OpenCode's existing compaction boundary - just controlling where it's placed

Example:

Instead of: "System found compaction at 10am, chopping everything before"
You get:    "I'll mark this message where we finalized the architecture as 
             the new baseline - everything after stays active, inception 
             messages come along, and nothing is lost from history"

Key insight: We don't change how compaction works - we just give users strategic control over the boundary while ensuring critical context travels forward. It's non-destructive context windowing.

Proposed mechanisms

1. Chess-clock auto-pruning Automatically maintains working context based on active conversation time, not wall-clock time.

2. Inception (permanent preservation) Mark critical messages that survive ALL compactions:

Architectural decisions
Project constraints
Key requirements
Important discoveries

3. Heuristic pruning (smart prioritization) Not everything is "critical forever" or "delete immediately" - the middle ground matters:

Assign priority levels 1-10 to messages
System makes smart decisions: "We're at 95% capacity, prune priority 3 and below"
Users set relative importance without micromanaging
More sophisticated than binary preserve/delete

Example use cases:

Priority 10: Inception messages (never prune)
Priority 7-9: Important context (prune only under pressure)
Priority 4-6: Useful but not critical (prune when approaching limits)
Priority 1-3: Low value (prune early)
Priority 0: Immediate removal (bloat, obsolete context)

4. Aggressive pruning of bloat Mark noise for immediate removal (priority 0):

Massive tool outputs (giant file reads, verbose npm installs)
Failed debugging attempts
Obsolete context

5. External management tool CLI tool for session management outside the active session:

Iterate through message history without consuming tokens
Analyze context consumption
Mark messages for preservation/pruning
Zero impact on active session
Fast iteration on session management

6. Interactive context viewer (TUI) Built-in visualization:

Session token usage: 187k/200k (93%)

Largest messages:
1. [45k] Tool: read massive-file.ts - 2h ago  [Priority: _] [Inception]
2. [32k] Tool: npm install output - 3h ago   [Priority: _] [Inception]
3. [28k] Text: Full analysis...    - 1h ago   [Priority: _] [Inception]

Inception messages: 3 (12k tokens)
Messages marked for pruning: 0
Potential savings if pruned: 105k tokens

Think htop for session context.

Real-world results

From months of production use:

3-5x longer sessions (empirically measured)
Eliminate rebuild overhead (no more 30-minute context restoration)
Continuous flow across multiple days
Compound productivity - insights and context accumulate instead of resetting

Use cases

Multi-day feature development - preserve architectural context
Complex debugging - keep findings, prune failed attempts
Large codebase work - maintain project understanding across sessions
Long-running development with continuity

Addresses existing issues

#2945 - Session automatically compacted, destroying context
#3031 - Not enough context to continue after compaction
Related context-loss issues

Implementation status

This is not a proposal - it's a proven system.

We've been running this in production for months across multiple long-running sessions:

Full implementation as a working fork
Tested across 600k+ token sessions spanning days
Battle-tested tools: inception, preserve, prune, auto_prune, diagnose, repair
External CLI for zero-token session management
Empirically measured 3-5x session longevity improvements

What we're offering to contribute:

✅ Core modifications - Type definitions and filtering logic for sliding window
✅ ACM tools - Complete suite for preservation, pruning, and diagnosis
✅ External management - CLI tool for inspecting/managing sessions without token cost
✅ Chess-clock auto-pruning - Tested algorithm with configurable parameters
✅ Heuristic pruning - Priority-based context management
✅ Inception system - Permanent context preservation
✅ Documentation - From months of real-world usage patterns

Code is ready. We use this daily. The question is whether the approach aligns with OpenCode's direction.

If interested, we can:

Share the fork for evaluation
Discuss design preferences before adapting for upstream
Submit clean PR with tests and documentation
Or maintain as fork if it doesn't fit OpenCode's vision

We're not proposing an idea - we're offering working code that solves real pain.

Questions for maintainers

Does the sliding window approach align with OpenCode's vision?
Should this be opt-in or automatic with user controls?
Preferences on implementation:
- Message-level vs part-level metadata?
- Built-in TUI vs external tooling first?
Interest in chess-clock auto-pruning?
Value in heuristic pruning (priority levels 1-10)?
Value in external management tool for zero-token session inspection?

Nov 23 '25 14:11 rickross

This issue might be a duplicate of existing issues. Please check:

#2945: Session automatically compacted, destroying the entire working context
#3031: Model in BUILD mode does not have enough context to continue after compaction
#3032: Soft compaction / AI global workspace metabolism
#3099: Agent no follow rules after compact session
#4317: Feature: generic /compact command, auto-compaction, and fork-aware conversations

Feel free to ignore if none of these address your specific case.

Nov 23 '25 14:11 github-actions[bot]

holy. thats detailed.

Nov 23 '25 15:11 seannetlife

wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.

Nov 23 '25 15:11 seannetlife

wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.

We tried to figure out a way to do it without modifying core code, but it is necessary to modify the core compaction logic to provide the inception and sliding window features. We couldn't find any way around doing so.

Nov 23 '25 15:11 rickross

I used to do a similar thing with OpenWebUI, very basic though.

Keep the first few messages in the conversation to keep overall goal and then cull anything after that until the context window is within limits. Not really a compaction but more of a rolling cull. It did seem to work OK though.

This implementation is much more thorough and I can see it working really well - would be good to see this in action. Compaction right now is a massive pain point. I find the OpenCode implementation fairly mediocre and the Claude Code one actually not that bad... However, a summary compaction can only do so much!

Nov 24 '25 10:11 SteveyBoros

This seems like a great alternative to compaction as a summary message (as it is right now). Compaction right now also has the issue that custom commands are pruned away. For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?

Dec 09 '25 09:12 fkukuck

For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?

If you mark a message as "preserved" it survives compaction 100% intact. It is super easy to preserve messages and list preserved messages using the acm_preserve tool.

Also, fwiw, I don't think I have actually used /compact in months. I just run and run until my context is around 95% or more, then I tell my agent to "acm_prune 30" and it compacts away everything from more than 30-minutes ago (using the chess-clock time model.)

And there are ACM tools to map the context, to hunt for bloaty messages and to precision snipe them. Sometimes the culprit is just one long tool result, and acm_hunt + acm_snipe help you find and blast that kind of bloat very easily.

Dec 09 '25 13:12 rickross

This needs more love

Dec 10 '25 08:12 SteveyBoros

This needs more love

I probably would have packaged the whole thing as a giant PR, but the pace of releases of the opencode project is so rapid that I wouldn't know what to use as a baseline release. I just merge from the upstream code once or twice a week at this point, so I can have the latest opencode stuff. There's no way I could/would go back to not having the ACM (active context management) at this point!

We have also written a plugin that logs every turn of dialogue into a PostgreSQL database with full-text and vector searching. Using a simple cli tool we can now search the entire history of all the AI conversations, so we can restore context quickly on virtually any topic. ACM and this logging/search/recall capability have been serious game changers.

Dec 10 '25 12:12 rickross

Paging @rekram1-node and his opinion as this could potentially be a great feature that would differentiate OpenCode from similar tools such as Claude Code

Dec 10 '25 12:12 fkukuck