feat(CodeActAgent): Support Agent-User Interaction during Task Execution and the Full Integration of CodeActAgent
Dependency:
- https://github.com/OpenDevin/OpenDevin/pull/1255
- https://github.com/OpenDevin/OpenDevin/pull/1305
- https://github.com/OpenDevin/OpenDevin/pull/1333
- https://github.com/OpenDevin/OpenDevin/pull/1334
- https://github.com/OpenDevin/OpenDevin/pull/1426
- https://github.com/OpenDevin/OpenDevin/pull/1460
For general OpenDevin components:
- Add
IPythonRunCellActionthat allows the agent to use Jupyter plugin introduced in #1255 for task solving. AgentTalkActionthat allows the agent to communicate with the user (either to communicate results OR ask for clarification)- Agent-User Interaction: Now allow agent's wait for user's response before proceed to task completion.
- Added
__str__method to some of the actions to get the printed in a readable way - Add
thoughtattribute to mostAction(if applicable)
For CodeActAgent:
- Allow CodeActAgent to do two things for environmental interactions: (1) use
ipythonAND (2) usebash - Tweak the prompts of CodeActAgent to ensure its generality.
- The ability to use SWE-Agent's tool for task solving (still need more extensive testing).
For integration tests:
- Added integration tests for CodeActAgent.
- Support STDIN mock for "user responses" in integration test
The best thing about CodeActAgent is that we have existing data and open models (e.g., CodeActAgent, CodeQwen) trained on that data, which could means we can have a usable local model that can run on laptop.
Below demo is completed with CodeQwen, here's my config.toml:
LLM_MODEL="ollama/codeqwen"
LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:11434"
LLM_EMBEDDING_MODEL="local"
WORKSPACE_BASE="./workspace"
UPDATE CodeActAgent now can write a calculator and start a flask server in 5 steps! It is also integrated with the ability to discuss with users for clarifing questions, which, IMO is more natural for AI-user interaction.
Here's my config.toml to reproduce the following example.
LLM_MODEL="gpt-4-turbo"
LLM_API_KEY="sk-TODO"
LLM_EMBEDDING_MODEL=""
WORKSPACE_BASE="./workspace"
SANDBOX_CONTAINER_IMAGE="ghcr.io/opendevin/sandbox:xw-swe-agent-tool-plugins"
Command to run:
python ./opendevin/main.py -i 10 -c CodeActAgent --task "can you help me start a flask server that build a calculator at 8000?"
Created website server:
Finally, I fixed some issues with the front end! This PR should be ready for review!
Here's a video demo for the front end with CodeActAgent (no speed-up).
https://github.com/OpenDevin/OpenDevin/assets/38853559/5563f20a-c1ac-483b-a48e-56968090cb13
There's a bunch of issues for frontend that need to get fixed:
- Sometimes, the model would ask for the user's approval before continuing to execute the code; we should show a button "[Continue]" that sends over the text 'continue' to the agent when "Agent is awaiting user input." It may save some efforts for the user to type the word "[Continue]."
- IPython code execution is recognized as
undefinedby the front end. We will need to makeObservationfromIPythonActiona special class and handle it differently on the front end. I will work on this in a separate PR probably for clarity. Also, the front-end needs to support visualization ofIPythonAction, potentially by adding an additional tab betweenCodeEditorandBrowser.
-
We need to let the user know things are running! There's a bunch of logs/info on the backend; maybe we can consider displaying some of them to the user so they know the agent is working. Also, some animations focus on showing which panel is changing, which could be helpful for the user. We may have to get token streaming work at some point - otherwise, it takes a ton of time for the user to wait for a
calculator_app.pyto be written. -
We need to get the browser working and monitor the chat between the agent & user and the command line. When it detects URL patterns like
http://127.0.0.1:8000, a window should pop up a window asking the user whether to open that URL in the built-in browser. -
Around 01:47, we can see the terminal cannot be scrolled down to the bottom. There are probably some CSS issues.
Next steps:
- Try systematically evaluate this on SWE-Bench
- Try to see if we can get this working on OSS models
- Maybe support file uploading to workspace so that agent can better analyze them
It turns out I broke the integration tests by adding a thought attribute to most actions. Now, I finally fixed it!
I also added integration tests for CodeActAgent and supported the STDIN mockup for "user responses" during integration tests. (cc @li-boxuan)
CodeActAgent will fail with the exec integration task (well, and it is not designed to work with the exec box since it is not stateful). @rbren and I were thinking about deprecating ExecBox soon, so I removed it from the integration test for now.
This should be ready for review.
Btw did you encounter any annoying problems when trying to fix integration tests? Anything you think could be improved?
Another thought 💠:
I presume this agent can do many fancier things than other simpler ones (PlannerAgent? MonologueAgent?). You could probably add a test to demonstrate its larger power. That will also increase the code coverage. Right now, CodeActAgent-specific commands are not tested in the only simple naive integration test we have. This, of course, doesn't need to be addressed in current PR.
@li-boxuan, thanks for the review!!
Btw did you encounter any annoying problems when trying to fix integration tests? Anything you think could be improved?
Nope! I was confused at first until I found the README.md - it was very clear & easy for me to get the integration test fixed.
I presume this agent can do many fancier things than other simpler ones (PlannerAgent? MonologueAgent?). You could probably add a test to demonstrate its larger power. That will also increase the code coverage. Right now, CodeActAgent-specific commands are not tested in the only simple naive integration test we have. This, of course, doesn't need to be addressed in current PR.
Good idea! I think I will do this in a follow-up PR to include more challenging tasks that require multiple interactions with BOTH bash & python.
There will be a lot of TODOs following this:
- Get frontend working well with the new IPython feature https://github.com/OpenDevin/OpenDevin/pull/1363
- Add more integration tests for more complex behaviors (multi-turn interactions with both bash & python)
- Get SWE-Bench ready for testing (prompts are also subject to change)
Finally, with some modifications (https://github.com/OpenDevin/OpenDevin/pull/1426), I am finally able to get CodeActAgent working in the DooD sandbox with RUN_AS_DEVIN and network=host without issues :)
Steps to reproduce:
- switch to this PR
- build the
appcontainer:./containers/build.sh sandbox(you may need to comment out a few lines inbuild.shto make it work on your local machine - run the following, switch to CodeActAgent on the web interface, and it should work:
docker run \
-e LLM_API_KEY \
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
-v $WORKSPACE_BASE:/opt/workspace_base \
-e SANDBOX_USER_ID=$(id -u) \
-v /var/run/docker.sock:/var/run/docker.sock \
-p 3000:3000 \
--add-host host.docker.internal=host-gateway \
ghcr.io/opendevin/opendevin:latest
Codecov Report
Attention: Patch coverage is 71.24183% with 44 lines in your changes are missing coverage. Please review.
:exclamation: No coverage uploaded for pull request base (
main@62e4fb4). Click here to learn what that means.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #1290 +/- ##
=======================================
Coverage ? 60.66%
=======================================
Files ? 84
Lines ? 3605
Branches ? 0
=======================================
Hits ? 2187
Misses ? 1418
Partials ? 0
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@xingyaoww I implemented the user_message action here: https://github.com/OpenDevin/OpenDevin/commits/rb/codeact/
But I can't find a way to contribute it to your PR...
Edit: this is as far as I got
https://github.com/xingyaoww/OpenDevin/pull/1/files
@xingyaoww once my PR is merged into yours, this LGTM!
@rbren merged! this should be ready to go!
🎉 glad to have this in!
@rbren Thanks!!!
Next steps:
- Frontend!
- https://github.com/OpenDevin/OpenDevin/pull/1363/
- Switch to Jupyter Tab when a python execution request comes in (once https://github.com/OpenDevin/OpenDevin/pull/1363/ is merged)
- https://github.com/OpenDevin/OpenDevin/issues/1487
- Evaluation: get SWE-Bench score for CodeActAgent!