There's a proposal to integrate a trigger for rotation/summarization that activates when 50-70% of the selected AI model's context window is utilized.
As part of the implementation, automatic control of the context volume used must be ensured. When 50-70% of the token limit (of the selected model's maximum window) is reached, a rotation and/or summarization trigger should be activated via the agent-context-resolver. This will prevent overflow, information loss, and performance degradation, while also saving tokens.
I imagine the new whenToUse field in Roo Code will help a good deal towards this, besides dropping some of the modes.
@partounian - I fully support your proposal to add whenToUse; it will streamline routing and remove a lot of token “ballast.”
Request to the Roo Commander maintainer: please add a whenToUse entry (even a minimal one) to every existing mode.
That said, this only fixes the cosmetic layer-saving a few hundred tokens-but it does not address the core risk:
-
Each call still drags in too much data (long
roleDefinitions, stale chat messages, log snippets), quickly filling the context window. - In long sessions we still hit the model’s context limit, triggering truncation or other destructive fall-backs.
What else is required:
- Prompt filtering: send the model only the truly relevant slice of conversation and metadata; keep legacy mode descriptions completely out of the live prompt.
- A context-manager: once ~50-70 % of the window is used, automatically summarize and rotate older messages into an archive, leaving a pointer for on-demand retrieval.
Only the combination of these two steps both removes noise from every request and guarantees we never overflow the context window, no matter how long the dialogue gets.
In the Discord, Jez (the maintainer) mentions that he is currently working on a full rewrite cleanup.
https://github.com/OleynikAleksandr/Mode-Manager-Extension
Architectural Axiom: Who Should Understand the Task in AI-Orchestration
Essence
No expensive AI model should be responsible for filtering context, identifying relevant fragments, or assembling the system prompt.
These operations are costly, token-hungry, and repetitive. They should be delegated to cheap or local systems (LLMs, heuristics, vector databases, static rules) that:
operate for free or near-free,
can afford to make mistakes and retry,
ultimately output a compact, focused prompt.
Reasons
1. Premium models charge per byte
Roo Code with 100+ modes can consume 10–30k tokens on the initial prompt alone.
This happens before the model even starts answering.
2. The point of a system prompt is to be focused
Not to dump the entireÂ
.roomodes,Â.history,Â.workspace,Â.tools,Â.manual,Â.faq,Â.everything.
3. Repetition is the most expensive enemy
If the same 15k-token instruction block is sent with every request, your system doesn't generalize — it just burns money.
What the cheap model (or rule) should do
Step | Executor | Why a cheap model is fit -- | -- | -- Classify the query | local LLM / static rule | Costs ~100 tokens, low risk if wrong Summarize history | summarizer (cheap LLM / MCP tool) | Trims tail cheaply Retrieve relevant snippets | vector search / heuristic | Based on matching or embeddings Assemble system prompt | template + JS/Python | Simple, structured logicWhen to involve an expensive model
Only when:
the system prompt is already composed,
history has been filtered,
relevant context has been inserted,
the user question is clean and focused.
Example pipeline
User → Orchestrator (cheap LLM) → Prompt structure (no content)
↓
History + snippets + mode → template → system prompt
↓
→ Roo Code / GPT-4 / Claude — GIVES ANSWER
Implications
AllÂ
.roomodes ≠must be loaded every time.System prompt ≠static block, but dynamic per-task assembly.
Cheap model ≠text generator, but conductor of the orchestration pipeline.
Conclusion
Saving ≠squeezing more from the prompt.
Saving = eliminating dumb repetitions and offloading “task understanding” to where it costs nothing.
@OleynikAleksandr have you had success making any of your suggested changes?
https://github.com/jezweb/roo-commander/issues/46#issue-3116811136
I'm super impressed, this well beyond anything I could have created.