As part of the implementation, automatic control of the context volume used must be ensured. When 50-70% of the token limit (of the selected model's maximum window) is reached, a rotation and/or summarization trigger should be activated via the agent-context-resolver. This will prevent overflow, information loss, and performance degradation, while also saving tokens.

May 18 '25 21:05 OleynikAleksandr

I imagine the new whenToUse field in Roo Code will help a good deal towards this, besides dropping some of the modes.

May 19 '25 01:05 partounian

@partounian - I fully support your proposal to add whenToUse; it will streamline routing and remove a lot of token “ballast.” Request to the Roo Commander maintainer: please add a whenToUse entry (even a minimal one) to every existing mode.

That said, this only fixes the cosmetic layer-saving a few hundred tokens-but it does not address the core risk:

Each call still drags in too much data (long roleDefinitions, stale chat messages, log snippets), quickly filling the context window.
In long sessions we still hit the model’s context limit, triggering truncation or other destructive fall-backs.

What else is required:

Prompt filtering: send the model only the truly relevant slice of conversation and metadata; keep legacy mode descriptions completely out of the live prompt.
A context-manager: once ~50-70 % of the window is used, automatically summarize and rotate older messages into an archive, leaving a pointer for on-demand retrieval.

Only the combination of these two steps both removes noise from every request and guarantees we never overflow the context window, no matter how long the dialogue gets.

May 19 '25 05:05 OleynikAleksandr

In the Discord, Jez (the maintainer) mentions that he is currently working on a full rewrite cleanup.

May 19 '25 17:05 partounian

https://github.com/OleynikAleksandr/Mode-Manager-Extension

May 24 '25 20:05 OleynikAleksandr

Architectural Axiom: Who Should Understand the Task in AI-Orchestration

Essence

No expensive AI model should be responsible for filtering context, identifying relevant fragments, or assembling the system prompt.

These operations are costly, token-hungry, and repetitive. They should be delegated to cheap or local systems (LLMs, heuristics, vector databases, static rules) that:

operate for free or near-free,
can afford to make mistakes and retry,
ultimately output a compact, focused prompt.

Reasons

1. Premium models charge per byte

Roo Code with 100+ modes can consume 10–30k tokens on the initial prompt alone.
This happens before the model even starts answering.

2. The point of a system prompt is to be focused

Not to dump the entire .roomodes, .history, .workspace, .tools, .manual, .faq, .everything.

3. Repetition is the most expensive enemy

If the same 15k-token instruction block is sent with every request, your system doesn't generalize — it just burns money.

What the cheap model (or rule) should do

When to involve an expensive model

Only when:

the system prompt is already composed,
history has been filtered,
relevant context has been inserted,
the user question is clean and focused.

Example pipeline

User → Orchestrator (cheap LLM) → Prompt structure (no content)
                    ↓
     History + snippets + mode → template → system prompt
                    ↓
      → Roo Code / GPT-4 / Claude — GIVES ANSWER

Implications

All .roomodes ≠ must be loaded every time.
System prompt ≠ static block, but dynamic per-task assembly.
Cheap model ≠ text generator, but conductor of the orchestration pipeline.

Conclusion

Saving ≠ squeezing more from the prompt.
Saving = eliminating dumb repetitions and offloading “task understanding” to where it costs nothing.

May 28 '25 11:05 OleynikAleksandr

@OleynikAleksandr have you had success making any of your suggested changes?

May 28 '25 22:05 partounian

https://github.com/jezweb/roo-commander/issues/46#issue-3116811136

Jun 04 '25 07:06 OleynikAleksandr

I'm super impressed, this well beyond anything I could have created.

Jun 05 '25 08:06 jezweb

There's a proposal to integrate a trigger for rotation/summarization that activates when 50-70% of the selected AI model's context window is utilized.

Architectural Axiom: Who Should Understand the Task in AI-Orchestration

Essence

Reasons

1. Premium models charge per byte

2. The point of a system prompt is to be focused

3. Repetition is the most expensive enemy

What the cheap model (or rule) should do

When to involve an expensive model

Example pipeline

Implications

Conclusion