[Bug]: CodeActAgent awaits user input after each step
Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting
- [X] I have checked the existing issues.
Describe the bug
Currently, with the most recent master version, the agent seems to await user input after each step. Here is an example screenshot:
Current Version
commit 3d53d363b4416b05046dd390ca37d5745defce5a (origin/main, origin/HEAD, neubig/main, main)
Author: Boxuan Li <[email protected]>
Date: Tue May 14 00:50:29 2024 -0700
Installation and Configuration
`make build; make run`
Model and Agent
- Model: GPT-4
- Agent: CodeAct
Reproduction Steps
No response
Logs, Errors, Screenshots, and Additional Context
In the logs, after every step I see:
08:30:01 - opendevin:INFO: agent_controller.py:189 - Setting agent(CodeActAgent) state from AgentState.RUNNING to AgentState.AWAITING_USER_INPUT
For instance, here is the first action:
08:56:13 - opendevin:INFO: codeact_agent.py:279 - Cost: 0.02 USD | Accumulated Cost: 0.02 USD
08:56:13 - ACTION
MessageAction(content="Let's start by cloning the repository and creating a new branch. I'll proceed with these steps first.", wait_for_response=True, action='message')
08:56:13 - opendevin:INFO: agent_controller.py:182 - Setting agent(CodeActAgent) state from AgentState.RUNNING to AgentState.AWAITING_USER_INPUT
add "Don't give me intermediate steps" to the task
https://github.com/OpenDevin/OpenDevin/blob/8ee6c938e9bc9e001a8cc90a8f3ba14f484a639a/agenthub/codeact_agent/codeact_agent.py#L265-L267
Should get wait_for_response value from LLM
I'm not sure whether we can make CodeActAgent do like this : https://github.com/OpenDevin/OpenDevin/pull/1777
Let llm to choose whether continue executing or ask for user input ? @neubig
cc @xingyaoww @rbren
@assertion Based on my understanding of the solution in https://github.com/OpenDevin/OpenDevin/pull/1777, you are forcing the model to predict the argument of True vs. False and parse it to decide whether to ask for user input?
I think that eventually comes down to whether the LLM can choose between acting and asking for input. For GPT-4, this is probably not a problem - even if you don't force the model to generate arguments, they can still perform tasks autonomously vs. asking humans.
If you instruct a weaker model to do so, I think they will struggle with this nevertheless -- even if you force them to generate these "ask user input = True vs. False", plus we have to deal with parsing for this, which is kinda undesirable :(
I think we can probably address this by introducing the full-autonomous mode: https://github.com/OpenDevin/OpenDevin/issues/1798
@neubig Which version of GPT-4 are you running?
the agent seems to await user input after each step.
Nope, only when there is no coding action.
Closing in favor of https://github.com/OpenDevin/OpenDevin/issues/1798