OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

feat(CodeActAgent): Support Agent-User Interaction during Task Execution and the Full Integration of CodeActAgent

Open xingyaoww opened this issue 1 year ago • 11 comments

Dependency:

  • https://github.com/OpenDevin/OpenDevin/pull/1255
  • https://github.com/OpenDevin/OpenDevin/pull/1305
  • https://github.com/OpenDevin/OpenDevin/pull/1333
  • https://github.com/OpenDevin/OpenDevin/pull/1334
  • https://github.com/OpenDevin/OpenDevin/pull/1426
  • https://github.com/OpenDevin/OpenDevin/pull/1460

For general OpenDevin components:

  • Add IPythonRunCellAction that allows the agent to use Jupyter plugin introduced in #1255 for task solving.
  • AgentTalkAction that allows the agent to communicate with the user (either to communicate results OR ask for clarification)
  • Agent-User Interaction: Now allow agent's wait for user's response before proceed to task completion.
  • Added __str__ method to some of the actions to get the printed in a readable way
  • Add thought attribute to most Action (if applicable)

For CodeActAgent:

  • Allow CodeActAgent to do two things for environmental interactions: (1) use ipython AND (2) use bash
  • Tweak the prompts of CodeActAgent to ensure its generality.
  • The ability to use SWE-Agent's tool for task solving (still need more extensive testing).

For integration tests:

  • Added integration tests for CodeActAgent.
  • Support STDIN mock for "user responses" in integration test

The best thing about CodeActAgent is that we have existing data and open models (e.g., CodeActAgent, CodeQwen) trained on that data, which could means we can have a usable local model that can run on laptop.

Below demo is completed with CodeQwen, here's my config.toml:

LLM_MODEL="ollama/codeqwen"
LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:11434"
LLM_EMBEDDING_MODEL="local"
WORKSPACE_BASE="./workspace"

image

xingyaoww avatar Apr 22 '24 16:04 xingyaoww

UPDATE CodeActAgent now can write a calculator and start a flask server in 5 steps! It is also integrated with the ability to discuss with users for clarifing questions, which, IMO is more natural for AI-user interaction.

Here's my config.toml to reproduce the following example.

LLM_MODEL="gpt-4-turbo"
LLM_API_KEY="sk-TODO"
LLM_EMBEDDING_MODEL=""
WORKSPACE_BASE="./workspace"
SANDBOX_CONTAINER_IMAGE="ghcr.io/opendevin/sandbox:xw-swe-agent-tool-plugins"

Command to run: python ./opendevin/main.py -i 10 -c CodeActAgent --task "can you help me start a flask server that build a calculator at 8000?"

image image image

Created website server:

image

image

xingyaoww avatar Apr 24 '24 10:04 xingyaoww

Finally, I fixed some issues with the front end! This PR should be ready for review!

Here's a video demo for the front end with CodeActAgent (no speed-up).

https://github.com/OpenDevin/OpenDevin/assets/38853559/5563f20a-c1ac-483b-a48e-56968090cb13


There's a bunch of issues for frontend that need to get fixed:

  • Sometimes, the model would ask for the user's approval before continuing to execute the code; we should show a button "[Continue]" that sends over the text 'continue' to the agent when "Agent is awaiting user input." It may save some efforts for the user to type the word "[Continue]."
image
  • IPython code execution is recognized as undefined by the front end. We will need to make Observation from IPythonAction a special class and handle it differently on the front end. I will work on this in a separate PR probably for clarity. Also, the front-end needs to support visualization of IPythonAction, potentially by adding an additional tab between CodeEditor and Browser.
image
  • We need to let the user know things are running! There's a bunch of logs/info on the backend; maybe we can consider displaying some of them to the user so they know the agent is working. Also, some animations focus on showing which panel is changing, which could be helpful for the user. We may have to get token streaming work at some point - otherwise, it takes a ton of time for the user to wait for a calculator_app.py to be written.

  • We need to get the browser working and monitor the chat between the agent & user and the command line. When it detects URL patterns like http://127.0.0.1:8000, a window should pop up a window asking the user whether to open that URL in the built-in browser.

  • Around 01:47, we can see the terminal cannot be scrolled down to the bottom. There are probably some CSS issues.


Next steps:

  • Try systematically evaluate this on SWE-Bench
  • Try to see if we can get this working on OSS models
  • Maybe support file uploading to workspace so that agent can better analyze them

xingyaoww avatar Apr 24 '24 12:04 xingyaoww

It turns out I broke the integration tests by adding a thought attribute to most actions. Now, I finally fixed it!

I also added integration tests for CodeActAgent and supported the STDIN mockup for "user responses" during integration tests. (cc @li-boxuan)

CodeActAgent will fail with the exec integration task (well, and it is not designed to work with the exec box since it is not stateful). @rbren and I were thinking about deprecating ExecBox soon, so I removed it from the integration test for now.

This should be ready for review.

xingyaoww avatar Apr 27 '24 03:04 xingyaoww

Btw did you encounter any annoying problems when trying to fix integration tests? Anything you think could be improved?

li-boxuan avatar Apr 27 '24 05:04 li-boxuan

Another thought 💭 :

I presume this agent can do many fancier things than other simpler ones (PlannerAgent? MonologueAgent?). You could probably add a test to demonstrate its larger power. That will also increase the code coverage. Right now, CodeActAgent-specific commands are not tested in the only simple naive integration test we have. This, of course, doesn't need to be addressed in current PR.

li-boxuan avatar Apr 27 '24 05:04 li-boxuan

@li-boxuan, thanks for the review!!

Btw did you encounter any annoying problems when trying to fix integration tests? Anything you think could be improved?

Nope! I was confused at first until I found the README.md - it was very clear & easy for me to get the integration test fixed.

I presume this agent can do many fancier things than other simpler ones (PlannerAgent? MonologueAgent?). You could probably add a test to demonstrate its larger power. That will also increase the code coverage. Right now, CodeActAgent-specific commands are not tested in the only simple naive integration test we have. This, of course, doesn't need to be addressed in current PR.

Good idea! I think I will do this in a follow-up PR to include more challenging tasks that require multiple interactions with BOTH bash & python.

There will be a lot of TODOs following this:

  • Get frontend working well with the new IPython feature https://github.com/OpenDevin/OpenDevin/pull/1363
  • Add more integration tests for more complex behaviors (multi-turn interactions with both bash & python)
  • Get SWE-Bench ready for testing (prompts are also subject to change)

xingyaoww avatar Apr 27 '24 09:04 xingyaoww

Finally, with some modifications (https://github.com/OpenDevin/OpenDevin/pull/1426), I am finally able to get CodeActAgent working in the DooD sandbox with RUN_AS_DEVIN and network=host without issues :)

Steps to reproduce:

  1. switch to this PR
  2. build the app container: ./containers/build.sh sandbox (you may need to comment out a few lines in build.sh to make it work on your local machine
  3. run the following, switch to CodeActAgent on the web interface, and it should work:
docker run \
    -e LLM_API_KEY \
    -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
    -v $WORKSPACE_BASE:/opt/workspace_base \
    -e SANDBOX_USER_ID=$(id -u) \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    --add-host host.docker.internal=host-gateway \
    ghcr.io/opendevin/opendevin:latest

xingyaoww avatar Apr 28 '24 12:04 xingyaoww

Codecov Report

Attention: Patch coverage is 71.24183% with 44 lines in your changes are missing coverage. Please review.

:exclamation: No coverage uploaded for pull request base (main@62e4fb4). Click here to learn what that means.

Files Patch % Lines
opendevin/action/bash.py 62.16% 14 Missing :warning:
opendevin/controller/agent_controller.py 62.50% 12 Missing :warning:
opendevin/server/agent/agent.py 0.00% 8 Missing :warning:
agenthub/codeact_agent/codeact_agent.py 76.92% 6 Missing :warning:
opendevin/action/agent.py 85.71% 2 Missing :warning:
opendevin/observation/run.py 80.00% 2 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1290   +/-   ##
=======================================
  Coverage        ?   60.66%           
=======================================
  Files           ?       84           
  Lines           ?     3605           
  Branches        ?        0           
=======================================
  Hits            ?     2187           
  Misses          ?     1418           
  Partials        ?        0           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Apr 30 '24 08:04 codecov-commenter

@xingyaoww I implemented the user_message action here: https://github.com/OpenDevin/OpenDevin/commits/rb/codeact/

But I can't find a way to contribute it to your PR...

Edit: this is as far as I got

https://github.com/xingyaoww/OpenDevin/pull/1/files

rbren avatar Apr 30 '24 21:04 rbren

@xingyaoww once my PR is merged into yours, this LGTM!

rbren avatar Apr 30 '24 21:04 rbren

@rbren merged! this should be ready to go!

xingyaoww avatar May 01 '24 08:05 xingyaoww

🎉 glad to have this in!

rbren avatar May 01 '24 12:05 rbren

@rbren Thanks!!!

Next steps:

  • Frontend!
    • https://github.com/OpenDevin/OpenDevin/pull/1363/
    • Switch to Jupyter Tab when a python execution request comes in (once https://github.com/OpenDevin/OpenDevin/pull/1363/ is merged)
    • https://github.com/OpenDevin/OpenDevin/issues/1487
  • Evaluation: get SWE-Bench score for CodeActAgent!

xingyaoww avatar May 01 '24 12:05 xingyaoww