OpenHands [Bug]: CodeActAgent awaits user input after each step

Is there an existing issue for the same bug?

[X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting
[X] I have checked the existing issues.

Describe the bug

Currently, with the most recent master version, the agent seems to await user input after each step. Here is an example screenshot:

Current Version

commit 3d53d363b4416b05046dd390ca37d5745defce5a (origin/main, origin/HEAD, neubig/main, main)
Author: Boxuan Li <[email protected]>
Date:   Tue May 14 00:50:29 2024 -0700

Installation and Configuration

`make build; make run`

Model and Agent

Model: GPT-4
Agent: CodeAct

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

In the logs, after every step I see:

08:30:01 - opendevin:INFO: agent_controller.py:189 - Setting agent(CodeActAgent) state from AgentState.RUNNING to AgentState.AWAITING_USER_INPUT

For instance, here is the first action:

08:56:13 - opendevin:INFO: codeact_agent.py:279 - Cost: 0.02 USD | Accumulated Cost: 0.02 USD
08:56:13 - ACTION
MessageAction(content="Let's start by cloning the repository and creating a new branch. I'll proceed with these steps first.", wait_for_response=True, action='message')
08:56:13 - opendevin:INFO: agent_controller.py:182 - Setting agent(CodeActAgent) state from AgentState.RUNNING to AgentState.AWAITING_USER_INPUT

May 14 '24 12:05 neubig

add "Don't give me intermediate steps" to the task

https://github.com/OpenDevin/OpenDevin/blob/8ee6c938e9bc9e001a8cc90a8f3ba14f484a639a/agenthub/codeact_agent/codeact_agent.py#L265-L267

Should get wait_for_response value from LLM

May 14 '24 13:05 SmartManoj

I'm not sure whether we can make CodeActAgent do like this : https://github.com/OpenDevin/OpenDevin/pull/1777

Let llm to choose whether continue executing or ask for user input ? @neubig

cc @xingyaoww @rbren

May 15 '24 02:05 assertion

@assertion Based on my understanding of the solution in https://github.com/OpenDevin/OpenDevin/pull/1777, you are forcing the model to predict the argument of True vs. False and parse it to decide whether to ask for user input?

I think that eventually comes down to whether the LLM can choose between acting and asking for input. For GPT-4, this is probably not a problem - even if you don't force the model to generate arguments, they can still perform tasks autonomously vs. asking humans.

If you instruct a weaker model to do so, I think they will struggle with this nevertheless -- even if you force them to generate these "ask user input = True vs. False", plus we have to deal with parsing for this, which is kinda undesirable :(

I think we can probably address this by introducing the full-autonomous mode: https://github.com/OpenDevin/OpenDevin/issues/1798

May 15 '24 03:05 xingyaoww

@neubig Which version of GPT-4 are you running?

May 15 '24 04:05 SmartManoj

the agent seems to await user input after each step.

Nope, only when there is no coding action.

Jun 03 '24 06:06 SmartManoj

Closing in favor of https://github.com/OpenDevin/OpenDevin/issues/1798

Jun 03 '24 07:06 SmartManoj