[FEATURE] Sliding window context management for long-running sessions
Title: Sliding window context management for long-running sessions
The core insight
Current compaction amputates and recapitulates:
[old context] [recent work] → CHOP → [AI summary] [recent work]
We discovered a better approach - slide the window forward:
[inception] [old] [working context] → slide marker → [inception] [working context]
Instead of cutting off context and trying to recover with summaries, move the compaction marker forward through history while critical context travels with you. Like walking through time rather than repeatedly jumping off cliffs.
Why this matters
Current pain:
- Sessions hit token limits and die
- Compaction discards important context
- AI-generated summaries are lossy and generic
- Too much developer time is spent rebuilding context in new sessions
- Starting fresh means re-explaining architecture, decisions, relevant files, and failed approaches
With sliding window:
- Context slides forward continuously
- Critical decisions preserved automatically
- Working context stays intact
- Zero rebuild time - just keep working
- Multi-day sessions with maintained flow
Inception: Context that survives everything
Named after the film where ideas are planted deep enough to become foundational truths, inception messages are context so critical they must survive all compactions.
The concept: Just as the movie's characters planted ideas that became foundational, inception messages define:
- Project architecture and core design decisions
- Immutable constraints (security rules, API contracts, coding standards)
- Critical discoveries and key requirements
- Working preferences and non-negotiable rules
Why it matters: Without inception, long-running sessions force constant context rebuilding. You re-explain your architectural decisions, project constraints, and critical context after every compaction. Inception messages are planted once and become permanent bedrock - they travel with you through the entire session lifecycle.
Example:
Project architecture:
"This system uses event sourcing. ALL state changes must go through the event
bus. Direct database writes are forbidden. This architectural decision is final."
Development constraints:
"When working on this codebase, always run tests before committing. Prefer
functional patterns over OOP. Never modify files in /vendor/. These are
non-negotiable."
Critical context:
"We're migrating from MongoDB to PostgreSQL. Any new features must use the new
schema. The old system will be deprecated in Q2. This migration context must
remain active throughout development."
These survive ALL compactions, ensuring continuity of project understanding.
Technical implementation:
- Messages marked with
preserve: true - Never pruned, regardless of age or token pressure
- Slide forward with every compaction boundary
- Form the continuous thread of project context
This is the foundation of long-running sessions - without inception, you're constantly rebuilding context instead of building on it.
The discovery: Chess-clock context relevance
Through months of long-running sessions, we discovered context relevance follows active working time, not wall-clock time.
The chess-clock concept: Imagine a chess timer that only runs during actual work:
- Timer runs during active back-and-forth exchanges
- Timer pauses during idle gaps (meetings, lunch, overnight, thinking pauses)
- We measure "active conversation minutes" rather than wall-clock time
Why this works:
Example: 4-hour wall-clock session with 3-hour lunch break
- Wall-clock approach: "keep last 2 hours" → includes 2 hours of nothing
- Chess-clock approach: "keep 30 active minutes" → actual working conversation
Example: Rapid-fire debugging session
- 45 minutes of intense back-and-forth
- Chess clock: 45 active minutes (all relevant)
- Wall-clock: same, but can't distinguish from idle time
In practice:
auto_prune(
keep_active_minutes=30, # Keep 30 minutes of active conversation
gap_threshold=60 # Gaps longer than 60 seconds pause the clock
)
The gap_threshold (in seconds) defines when the clock pauses. A 60-second gap pauses the timer - if you step away for lunch, that time doesn't count against your 30-minute window.
This preserves coherent working context while aggressively pruning old material.
How the sliding works
Traditional compaction (automatic amputation):
- System finds the most recent compaction summary marker
- Cuts everything before it indiscriminately
- Generates AI summary to try recovering lost context
- User has no control over what gets chopped
Our approach (deliberate high-water marking):
- Nothing is deleted - all messages remain in storage
- User examines session history and chooses a specific message as the cut point
- Everything after that marker stays active in context
- Messages marked with
preserve: true(inception) slide forward with the window, regardless of age - We leverage OpenCode's existing compaction boundary - just controlling where it's placed
Example:
Instead of: "System found compaction at 10am, chopping everything before"
You get: "I'll mark this message where we finalized the architecture as
the new baseline - everything after stays active, inception
messages come along, and nothing is lost from history"
Key insight: We don't change how compaction works - we just give users strategic control over the boundary while ensuring critical context travels forward. It's non-destructive context windowing.
Proposed mechanisms
1. Chess-clock auto-pruning Automatically maintains working context based on active conversation time, not wall-clock time.
2. Inception (permanent preservation) Mark critical messages that survive ALL compactions:
- Architectural decisions
- Project constraints
- Key requirements
- Important discoveries
3. Heuristic pruning (smart prioritization) Not everything is "critical forever" or "delete immediately" - the middle ground matters:
- Assign priority levels 1-10 to messages
- System makes smart decisions: "We're at 95% capacity, prune priority 3 and below"
- Users set relative importance without micromanaging
- More sophisticated than binary preserve/delete
Example use cases:
- Priority 10: Inception messages (never prune)
- Priority 7-9: Important context (prune only under pressure)
- Priority 4-6: Useful but not critical (prune when approaching limits)
- Priority 1-3: Low value (prune early)
- Priority 0: Immediate removal (bloat, obsolete context)
4. Aggressive pruning of bloat Mark noise for immediate removal (priority 0):
- Massive tool outputs (giant file reads, verbose npm installs)
- Failed debugging attempts
- Obsolete context
5. External management tool CLI tool for session management outside the active session:
- Iterate through message history without consuming tokens
- Analyze context consumption
- Mark messages for preservation/pruning
- Zero impact on active session
- Fast iteration on session management
6. Interactive context viewer (TUI) Built-in visualization:
Session token usage: 187k/200k (93%)
Largest messages:
1. [45k] Tool: read massive-file.ts - 2h ago [Priority: _] [Inception]
2. [32k] Tool: npm install output - 3h ago [Priority: _] [Inception]
3. [28k] Text: Full analysis... - 1h ago [Priority: _] [Inception]
Inception messages: 3 (12k tokens)
Messages marked for pruning: 0
Potential savings if pruned: 105k tokens
Think htop for session context.
Real-world results
From months of production use:
- 3-5x longer sessions (empirically measured)
- Eliminate rebuild overhead (no more 30-minute context restoration)
- Continuous flow across multiple days
- Compound productivity - insights and context accumulate instead of resetting
Use cases
- Multi-day feature development - preserve architectural context
- Complex debugging - keep findings, prune failed attempts
- Large codebase work - maintain project understanding across sessions
- Long-running development with continuity
Addresses existing issues
- #2945 - Session automatically compacted, destroying context
- #3031 - Not enough context to continue after compaction
- Related context-loss issues
Implementation status
This is not a proposal - it's a proven system.
We've been running this in production for months across multiple long-running sessions:
- Full implementation as a working fork
- Tested across 600k+ token sessions spanning days
- Battle-tested tools: inception, preserve, prune, auto_prune, diagnose, repair
- External CLI for zero-token session management
- Empirically measured 3-5x session longevity improvements
What we're offering to contribute:
✅ Core modifications - Type definitions and filtering logic for sliding window
✅ ACM tools - Complete suite for preservation, pruning, and diagnosis
✅ External management - CLI tool for inspecting/managing sessions without token cost
✅ Chess-clock auto-pruning - Tested algorithm with configurable parameters
✅ Heuristic pruning - Priority-based context management
✅ Inception system - Permanent context preservation
✅ Documentation - From months of real-world usage patterns
Code is ready. We use this daily. The question is whether the approach aligns with OpenCode's direction.
If interested, we can:
- Share the fork for evaluation
- Discuss design preferences before adapting for upstream
- Submit clean PR with tests and documentation
- Or maintain as fork if it doesn't fit OpenCode's vision
We're not proposing an idea - we're offering working code that solves real pain.
Questions for maintainers
- Does the sliding window approach align with OpenCode's vision?
- Should this be opt-in or automatic with user controls?
- Preferences on implementation:
- Message-level vs part-level metadata?
- Built-in TUI vs external tooling first?
- Interest in chess-clock auto-pruning?
- Value in heuristic pruning (priority levels 1-10)?
- Value in external management tool for zero-token session inspection?
This issue might be a duplicate of existing issues. Please check:
- #2945: Session automatically compacted, destroying the entire working context
- #3031: Model in BUILD mode does not have enough context to continue after compaction
- #3032: Soft compaction / AI global workspace metabolism
- #3099: Agent no follow rules after compact session
- #4317: Feature: generic /compact command, auto-compaction, and fork-aware conversations
Feel free to ignore if none of these address your specific case.
holy. thats detailed.
wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.
wonder if this sort of thing could be tested / switched out via the opencode plugin system. i've got no idea of the plugin architecture, but sounds like it'd be cool to be able to hot swap community context management approaches.
We tried to figure out a way to do it without modifying core code, but it is necessary to modify the core compaction logic to provide the inception and sliding window features. We couldn't find any way around doing so.
I used to do a similar thing with OpenWebUI, very basic though.
Keep the first few messages in the conversation to keep overall goal and then cull anything after that until the context window is within limits. Not really a compaction but more of a rolling cull. It did seem to work OK though.
This implementation is much more thorough and I can see it working really well - would be good to see this in action. Compaction right now is a massive pain point. I find the OpenCode implementation fairly mediocre and the Claude Code one actually not that bad... However, a summary compaction can only do so much!
This seems like a great alternative to compaction as a summary message (as it is right now). Compaction right now also has the issue that custom commands are pruned away. For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?
For example, if i start my session with a custom command and then compaction hits, the custom command setup is gone. Is this mitigated your method? Would the initial message stay there when preserve: true?
If you mark a message as "preserved" it survives compaction 100% intact. It is super easy to preserve messages and list preserved messages using the acm_preserve tool.
Also, fwiw, I don't think I have actually used /compact in months. I just run and run until my context is around 95% or more, then I tell my agent to "acm_prune 30" and it compacts away everything from more than 30-minutes ago (using the chess-clock time model.)
And there are ACM tools to map the context, to hunt for bloaty messages and to precision snipe them. Sometimes the culprit is just one long tool result, and acm_hunt + acm_snipe help you find and blast that kind of bloat very easily.
This needs more love
This needs more love
I probably would have packaged the whole thing as a giant PR, but the pace of releases of the opencode project is so rapid that I wouldn't know what to use as a baseline release. I just merge from the upstream code once or twice a week at this point, so I can have the latest opencode stuff. There's no way I could/would go back to not having the ACM (active context management) at this point!
We have also written a plugin that logs every turn of dialogue into a PostgreSQL database with full-text and vector searching. Using a simple cli tool we can now search the entire history of all the AI conversations, so we can restore context quickly on virtually any topic. ACM and this logging/search/recall capability have been serious game changers.
Paging @rekram1-node and his opinion as this could potentially be a great feature that would differentiate OpenCode from similar tools such as Claude Code