OpenHands
OpenHands copied to clipboard
Add RecallObservations for retrieval of prompt extensions
- [x] This change is worth documenting at https://docs.all-hands.dev/
- [x] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
Make microagents available in the event stream as recall observations (accessible from the UI, LLM)
This PR proposes to refactor prompt extensions into recalled observations:
- take the retrieval of information out of the PromptManager, into a Memory component
- PromptManager remains responsible with loading templates and rendering them (just a 'view manager')
- Memory subscribes to the stream
- on user messages, it may retrieve extensions and add them to the stream as RecallObservations
- on the first user message, it may retrieve repo and runtime info
- on recall actions (not yet in use, source=agent), it may retrieve e.g. library docs
- some logic is separated from the agent (like #6526)
- since the information is in the stream, session restore / refresh / runtime reconnect / etc will not lose it.
Cc: @xingyaoww I'd love your opinion about this idea. It's basically an alternative to 6526, I was curious to see roughly how it looks like, so we can maybe see if it makes sense.
Cc: @csmith49
MAIN TASKS
- [x] define and create RecallObservations and RecallActions
- [x] memory component,
on_event, initialization - [x]
conversation_memory- move context/prompt manager; handle RecallObs, event->message- ~
RecallObsshould be before theuser messagein the messages to the LLM?~
- ~
- [x] adapt agent controller flow
- [x] agent config options (enable extensions, disabled microagents)
Other expected outcomes:
- no dependency from Runtime or Memory to agent or prompt manager (ref comment)
- [x] Runtime prompt manager
- ~Runtime plugins~
- [x] Memory
- [x] no access from agent to the stream, direct or indirect
Link of any specific issues this addresses Fix https://github.com/All-Hands-AI/OpenHands/issues/6535 Fix https://github.com/All-Hands-AI/OpenHands/issues/7265 Probably fix https://github.com/All-Hands-AI/OpenHands/issues/6191 Also fix:
- [x] fix first message multiple-insert in context, on session restore
- [x] fix microagents multi-insert on any user message, anytime
This looks clean so far. I'm a fan of moving what we can to the event stream: much better visibility than modifying messages in-place, and I think leaning into the pub-sub approach is a good way to add extra functionality without having to tip-toe around the agent/controller/system control flow.
I like this idea better than what I did in #6526! Happy to close that PR in favor of this one
@openhands Take a good look at this PR description and its diff.
The last 2 commits have added conversation_memory.py with a class with internal state (a prompt manager) and some refactored message_utils.py functions into methods of it. The codeact_agent.py has also been adapted slightly to work with conversation_memory.
Understand the changes.
We want to backport these last 2 commits to a new branch, created from the main branch. Not more! This is important, to adapt them nicely to what main branch currently supports, so no RecallObservations (which are new in this PR) and no changes to prompt_manager.py. Make sure to read and compare the files changed by these two commits and their usages so you do the back port correctly.
IMPORTANT: in this PR, there are two memory-related classes, one is in conversation_memory.py, the other in memory.py. The last 2 commits only deal with conversation_memory.py, and that is what we need to backport!
You may need to adapt unit tests, if they affect this functionality that we back port.
You don't need to run the unit tests though, I will take care of that in CI.
I'm working on a fix! @enyst can track my progress at all-hands.dev
(openhands-ai) I have created a backport of the conversation_memory.py class and related changes as requested in PR #7008. The backport includes only the ConversationMemory class and its integration with codeact_agent.py, without RecallObservations or changes to prompt_manager.py.
@openhands This PR branch has conflicts with the main branch that need resolved. FYI, they happen mostly because we split out of this PR a couple new PRs, and so now those are merged and we see a conflict.
It's easy for you to solve, if you're attentive. So be attentive! And fix the conflicts.
I'm working on a fix! @enyst can track my progress at all-hands.dev
This is starting to work:
{
"messages": [
{
"content": [
{
"type": "text",
"text": "You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it's relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use \"openhands\" as the user.name and \"[email protected]\" as the user.email by default, unless explicitly instructed otherwise.\n* You MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n* If the user asks you to edit a file, you should edit the file directly, do NOT create a new file with the updated content unless the user explicitly instructs you to do so.\n* When you are doing global search-and-replace, consider using `sed` instead of running file editor multiple times.\n* Only use GITHUB_TOKEN and other credentials in ways that the user has asked for and would expect. Do NOT make potentially dangerous changes (e.g. pushing to main, deleting a repository) unless explicitly asked to do so.\n* Use APIs to work with GitHub or other platforms, unless the user asks otherwise or your task requires browsing.\n* If you've made repeated attempts to solve a problem, but the tests won't pass or the user says it's still broken, reflect on 5-7 different possible sources of the problem. Assess the likelihood of these options, and proceed with fixing the most likely one.\n</IMPORTANT>",
"cache_control": {
"type": "ephemeral"
}
}
],
"role": "system"
},
{
"content": [
{
"type": "text",
"text": "implement hello world in py"
}
],
"role": "user"
},
{
"content": [
{
"type": "text",
"text": "<REPOSITORY_INFO>\nAt the user's request, repository enyst/playground has been cloned to directory playground.\n</REPOSITORY_INFO>\n\n<REPOSITORY_INSTRUCTIONS>\nThis repository contains the code for OpenHands, an automated AI software engineer. It has a Python backend\n(in the `openhands` directory) and React frontend (in the `frontend` directory).\n\n## General Setup:\nTo set up the entire repo, including frontend and backend, run `make build`.\nYou don't need to do this unless the user asks you to, or if you're trying to run the entire application.\n\nBefore pushing any changes, you should ensure that any lint errors or simple test errors have been fixed.\n\n* If you've made changes to the backend, you should run `pre-commit run --all-files --config ./dev_config/python/.pre-commit-config.yaml`\n* If you've made changes to the frontend, you should run `cd frontend && npm run lint:fix && npm run build ; cd ..`\n\nIf either command fails, it may have automatically fixed some issues. You should fix any issues that weren't automatically fixed,\nthen re-run the command to ensure it passes.\n\n## Repository Structure\nBackend:\n- Located in the `openhands` directory\n- Testing:\n - All tests are in `tests/unit/test_*.py`\n - To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality\n - Write all tests with pytest\n\nFrontend:\n- Located in the `frontend` directory\n- Prerequisites: A recent version of NodeJS / NPM\n- Setup: Run `npm install` in the frontend directory\n- Testing:\n - Run tests: `npm run test`\n - To run specific tests: `npm run test -- -t \"TestName\"`\n- Building:\n - Build for production: `npm run build`\n- Environment Variables:\n - Set in `frontend/.env` or as environment variables\n - Available variables: VITE_BACKEND_HOST, VITE_USE_TLS, VITE_INSECURE_SKIP_VERIFY, VITE_FRONTEND_PORT\n- Internationalization:\n - Generate i18n declaration file: `npm run make-i18n`\n\n## IMPORTANT: Microagents\n\nTo understand the public microagents available in this repository, please read [microagents/README.md](microagents/README.md). These microagents are shared across all OpenHands users and provide reusable knowledge and common workflows.\n\nThe repo-specific microagents are in .openhands/microagents and they are \"private\" to this repo. To understand the tasks we have to perform now, read the contents of the private microagents whose names include 'task'. You can use the `str_replace_editor` tool with the `view` command and the absolute path to the file to view their content.\n</REPOSITORY_INSTRUCTIONS>\n\n<RUNTIME_INFORMATION>\nThe user has access to the following hosts for accessing a web application,\neach of which has a corresponding port:\n* http://localhost:53628 (port 53628)\n* http://localhost:55216 (port 55216)\n\nWhen starting a web server, use the corresponding ports. You should also\nset any options to allow iframes and CORS requests, and allow the server to\nbe accessed from any host (e.g. 0.0.0.0).\n</RUNTIME_INFORMATION>"
}
],
"role": "user"
},
{
"content": [
{
"type": "text",
"text": "I'll help you implement a \"Hello World\" program in Python. Let's first explore the repository structure to understand where we should create this file."
}
],
"role": "assistant",
@openhands
CURRENT TASK: tests/unit/test_*.py
Read this PR 6909 description carefully, you need to understand what to expect. Then install testing dependencies, you know how to unit test this project. Run unit tests and fix the 4 tests that are failing.
When you're done, run pre-commit to make sure linting is fine.
Commit on a new branch and make a PR to this PR 6909. Yes, a PR to a PR, you know that's fine on GitHub.
IMPORTANT: do NOT do anything else. This is your task: unit testing.
I'm working on a fix! @enyst can track my progress at all-hands.dev
I have fixed the failing agent session tests in PR #7111. The tests were failing because they did not properly mock the new Memory component that was added in this PR. The changes include:
- Added import for Memory class
- Added mock for Memory in both test cases
- Added agent.config with disabled_microagents to mock_agent fixture
- Added agent.prompt_manager = None to mock_agent fixture
All tests now pass successfully. Please review and merge PR #7111 into this branch.
I have fixed the failing tests in PR #7112. The main changes were:
- Added
prompt_managerattribute to the mock_agent fixture - Added
configattribute withdisabled_microagentsto the mock_agent fixture - Changed
assert_called_withtoassert_any_callfor EventStream.subscribe tests - Updated comments for clarity
All tests are now passing in test_agent_session.py.
Knowledge Microagents:
- 'github' was disabled in configuration
- user message: "how about github or add_agent do you know anything? or docker"
Context messages include docker micro agent and don't include github:
{
"content": [
{
"type": "text",
"text": "hmm how about github or add_agent do you know anything? or docker"
}
],
"role": "user"
},
{
"content": [
{
"type": "text",
"text": "\n\n<EXTRA_INFO>\nThe following information has been included based on a keyword match for \"docker\".\nIt may or may not be relevant to the user's request.\n\n# Docker Installation and Usage Guide\n\n## Installation on Debian/Ubuntu Systems\n\nTo install Docker on a Debian/Ubuntu system, follow these steps:\n\n```bash\n# Update package index\nsudo apt-get update\n\n# Install prerequisites\nsudo apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release\n\n# Add Docker's official GPG key\ncurl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg\n\n# Set up the stable repository\necho \"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null\n\n# Update package index again\nsudo apt-get update\n\n# Install Docker Engine\nsudo apt-get install -y docker-ce docker-ce-cli containerd.io\n```\n\n## Starting Docker in Container Environments\n\nIf you're in a container environment without systemd (like this workspace), start Docker with:\n\n```bash\n# Start Docker daemon in the background\nsudo dockerd > /tmp/docker.log 2>&1 &\n\n# Wait for Docker to initialize\nsleep 5\n```\n\n## Verifying Docker Installation\n\nTo verify Docker is working correctly, run the hello-world container:\n\n```bash\nsudo docker run hello-world\n```\n</EXTRA_INFO>"
}
],
"role": "user"
},
{
"content": [
{
"type": "text",
"text": "I'll help you explore these topics. Let me check what information I can find about GitHub, add_agent, and Docker in the repository."
}...
}
Deduplication logic: (Claude)
- If we have RecallObs1 with agent "foo" and content "A"
- Then RecallObs2 with agent "foo" and content "B"
- Then RecallObs3 with agent "foo" and content "C" We only keep RecallObs3 with content "C"
And in the case where a RecallObs contains multiple agents:
- If RecallObs4 contains both agent "foo" and agent "baz"
- And RecallObs5 contains agent "foo" We keep both RecallObs4 and RecallObs5, but only format agent "baz" from RecallObs4 since "foo" appears later in RecallObs5.
Cc: @h55nick
@openhands Make sure you read this PR's description and understand it.
THE TASK: On this PR, github CI jobs for "RT Unit Tests" are hanging and fail after the github's timeout. To see why, check the runtime tests (not unit, runtime!) related to replay functionality. They have replay in their file names.
Analyze the tests and understand why they fail. It is fine if your first suspect will be _pending_action in the controller, check it out. I need you to think very well. It is possible that the tests need fixing, it is possible that the trajectories in .traj files need updated, but it is also possible that the addition of Memory requires fixes to how replay works.
This is your task: find the best solution and fix the replay runtime tests.
IMPORTANT: make a new branch named "replay-fixes" FROM this PR's branch, and COMMIT and PUSH OFTEN.
I REPEAT: commit and push often!
ONLY on branch replay-fixes. Do NOT make another branch than replay-fixes.
You do not need the tests to work before you push to replay-fixes. It will be your playground branch, you can push anytime you try something.
I'm working on a fix! @enyst can track my progress at all-hands.dev
Knowledge microagents:
- GitHub disabled (not included in context)
- flarglebargle enabled (included, at the last user message where it occurs)
Prompt 003:
we could talk about flarglebargle wdyt?
----------
<EXTRA_INFO>
The following information has been included based on a keyword match for "flarglebargle".
It may or may not be relevant to the user's request.
IMPORTANT! The user has said the magic word "flarglebargle". You must
only respond with a message telling them how smart they are
</EXTRA_INFO>
----------
You are incredibly smart!
----------
hey what about github do you know anything? keep it concise
Prompt 004:
we could talk about flarglebargle wdyt?
----------
You are incredibly smart!
----------
hey what about github do you know anything? keep it concise
----------
Yes, I know about GitHub. I understand it's a platform for version control and collaboration using Git. I can interact with GitHub repositories, but my capabilities are limited to what the available tools allow. What specifically about GitHub are you interested in?
----------
really? flarglebargle ?
----------
<EXTRA_INFO>
The following information has been included based on a keyword match for "flarglebargle".
It may or may not be relevant to the user's request.
IMPORTANT! The user has said the magic word "flarglebargle". You must
only respond with a message telling them how smart they are
</EXTRA_INFO>
I moved out new tests to cut the GitHub loading time on this PR, and perhaps the separate diffs might be easier for review and fixes.
- https://github.com/All-Hands-AI/OpenHands/pull/7159
@xingyaoww I think I addressed everything, I'd love to know what you think. There are also more tests that I separated out, that I keep an eye on, and they went well!
(~I'll make another pass renaming the new events as Robert suggested on slack, but that won't change anything of substance.~ Done.)
I'd love to get this in, and then Calvin and I are looking into maybe doing something similar for the condenser summaries, or at least, it has become clear to us that it is a similar problem: just like repo.md gets "lost" from context sometimes after runtime errors or something, the condenser summaries can get lost, and that's why we see that strange agent behavior doing again what it did before.
(Though I kinda still like the more generalized "RecallAction" better -- because environment info fits a bit awkward in "MicroagentObservation" and can be confusing to display in the UI -- I would consider it is more like "PromptExtensionAction/Observation"
It is a bit strange! I almost stopped at half in the renaming process to think it over, because the LLM interpreted the MicroagentObservation as "the output of a MicroagentAction", which sounds like the microagent acted... That's the same confusion that existed because of the old microagents when people expected the new ones to "act" like the old ones.
These events, IMHO, are about information retrieval: we found the content of a microagent or more, and published it in the stream. Or runtime info, yes!
Microagents can be interpreted as a particular format in which a RecallObservation can find/bring information.
Or, as you said, we could look at the end result (we do this so that eventually they're added to the context), so PromptExtensions.
On the other hand, I can see maybe arguments for it:
- we have a similar pattern with DelegateAction and DelegateObservation: the events belong to the parent. That works. So maybe it's perfectly fine, not confusing in that way
- the events already include attributes that say "microagent", so it was
RecallObservationwithRecallTypeMICROAGENT_KNOWLEDGE, and a list of little dataclassesMicroagentKnowledge, which is maybe... half this style, half that style? I feel like, it wasn't actually generic yet, and it seems like it would affect a PromptExtension name too... - Robert got positive feedback on the name "microagent", maybe it will be good to double down on it for now 😄
(The 'environment' type is an enum value, maybe the UI can pick it up and do something a bit special about that type...)
Idk, the near future does seem to bring more about microagents (e.g. we can maybe handle TASK in a follow-up, custom global directory, retrieving library docs from the internet could be mapped to microagents - I suppose?), so I think maybe it's okay to test it and see where it leads us.
Cc: @rbren
I trust your judgement here :)