Feature: user managed context
Summary
Adds a new feature to allow users to manually edit the messages included in their conversation context, so they can extend the period of their Goose sessions where models are at peak performance.
Users working with agents for anything complex (e.g. coding) will often see a curve in model effectiveness as their session length grows & all previous messages in session are included in the payload sent to the model. At first, as the user asks questions about files in the code base or completes initial tasks, the agent will add valuable info into context, "warming up" to reach peak performance. However, later in the session, performance begins to degrade as the context fills up. Response times increase, performance degrades, and users are eventually forced to start a new session and rebuild their initial context from scratch.
This feature adds a messenger-app style interface to remove previous messages (and model responses / tool calls) from the context, allowing users to extend their sessions by dropping old messages. This is especially helpful for sequential tasks, e.g. "migrate a list of 10 fraud rules from python to another language, where the rules do not depend on each other". A user can go rule by rule, feeding each rule into Goose, and then removing the previous rules from the session context before progressing to the next rule - leaving the initial setup information on the destination codebase etc preserved in the initial messages in the session.
Alternative to automated compaction, that gives user control especially for long-running iterative sessions like the example above, where auto-compaction strategy does not have enough information to deprioritize irrelevant steps.
Type of Change
- [X] Feature
- [ ] Bug fix
- [ ] Refactor / Code quality
- [ ] Performance improvement
- [ ] Documentation
- [ ] Tests
- [ ] Security fix
- [ ] Build / Release
- [ ] Other (specify below)
AI Assistance
- [X] This PR was created or reviewed with AI assistance
Testing
extensive manual testing & the existing cargo tests Discussion: LINK (if any)
Screenshots/Demos (for UX changes)
https://github.com/user-attachments/assets/a381a62a-4072-440a-aae9-60ca328f3823
One challenge which complicates things quite a bit here is the fact that the visual conversation state in the client doesn't totally match the agent's context. Since goose runs in the pattern of:
Conversation 1 Compaction Conversation 2 Compaction
So messages from conversation 1 prior to the first compaction still show in the client, but aren't agent visible. And it compaction does occur all of your hand chosen messages need to be manually stored.
Overall though, this feels like a rather heavy abstraction and puts a lot of onus on the user to scroll thru the conversation (which isn't the best experience) and manage the context. One nice thing about auto-compaction now is that it happens automatically with the big downside being as you called out that it's lossy.
I really like your general goal though around improving the warmup phase. I wonder if there's some lower hanging fruit there. One thing you can do is have a .goosehints file, or use the memory extension, but that's not session specific.
Maybe an interesting abstraction could be some kind of session memory that you can add messages and files too, and that is durable post-compaction to make the warmup smoother, rather than trying to manually manage the entire context window.
The way I've been using this for migration work, I never hit Compaction, because I'm managing my conversation context manually. There's probably a more graceful way to have these two context management strategies intersect, happy to poke at that if you think valuable @katzdave?
I'm finding the automated compaction is so lossy that I basically just have to start new sessions if compaction happens when I'm doing any complex iterative work (e.g. migrating fraud rules that don't depend on each other). Also, I was finding performance degradation starts before you hit the compaction threshold, so managing the context early is really valuable. I do use both goosehints and the memory extensions, this is more for session specific memory.
General thesis here is that there's value in giving users control over their session context. If they don't want that control, they can rely on auto compaction. But for power users running long sessions, this approach of user managed context has allowed me to move a lot faster on some recent migration work.
The way I've been using this for migration work, I never hit Compaction, because I'm managing my conversation context manually. There's probably a more graceful way to have these two context management strategies intersect, happy to poke at that if you think valuable @katzdave?
I'm finding the automated compaction is so lossy that I basically just have to start new sessions if compaction happens when I'm doing any complex iterative work (e.g. migrating fraud rules that don't depend on each other). Also, I was finding performance degradation starts before you hit the compaction threshold, so managing the context early is really valuable. I do use both goosehints and the memory extensions, this is more for session specific memory.
General thesis here is that there's value in giving users control over their session context. If they don't want that control, they can rely on auto compaction. But for power users running long sessions, this approach of user managed context has allowed me to move a lot faster on some recent migration work.
I think compaction is a bit inevitable; because the act of using your session kind of goes against manually managing the context as it will slowly degrade what you've curated.
I'm with you that compact can be a bit abrupt and that larger contexts can also degrade even before compaction (+ cost more and are slow). I generally keep my auto-compact threshold around 50%.
Recipes are also a good way for building initial context on fresh sessions, but we don't really have a tie in there for post-compaction.
Given that you have a particular use case in mind. Would be really interesting to see if there's anything that can be done particularly on the post-compaction degradation; What are the specific gaps of context that are being lost? Would something like 'protecting' those messages from compaction have an impact?
Or maybe some kind of on-compact hook to automatically run a recipe on the compacted session?
Closing this for now. We discussed offline that recipes or copying sessions after they have the initial context could help for this use case.