OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Bug]: Loop detection kills agents that are waiting on long-running processes

Open neubig opened this issue 1 year ago • 12 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Describe the bug and reproduction steps

We have loop detection in OpenHands to prevent agents from getting stuck in a loop and waiting forever. However a confluence of two things have made loop detection potentially harmful:

  1. Agents based on claude-sonnet-3.5 new are much better and rarely get stuck in loops
  2. We have a 2-minute timeout on agents, which means that they need to repeatedly sleep while waiting for long-running processes

As a result, recently I have seen multiple instances of agents waiting on processes to finish, but getting killed with "agent got stuck in a loop" when they waited to long. I don't know conceptually the best solution to this, but wanted to flag it as an issue

OpenHands Installation

Docker command in README

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

neubig avatar Dec 01 '24 22:12 neubig

Do we have an example log from this? Sorry, off-hand I don't see why waiting would do that... 🤔

enyst avatar Dec 01 '24 22:12 enyst

To elaborate a bit why this confuses me:

  • the loop detection works on history, actions and obs that have been completed and are in history; it doesn't check things during the creation of an observation
  • the loop detection check should be attempted once per step, regardless how long the step took...

And as I was writing this, checking the code, it seems the last one isn't quite true anymore: when we moved the check from the end of the step to the beginning of the step, it ended up before we check if there's a delegate in progress... which doesn't seem the correct thing to do. It means the parent agent will check endlessly the same thing (the last steps of its own history), while waiting for a delegate.

Was this seen with a delegate?

I'm still not sure why it would trigger the stop, because the first remark above should still apply (it spins over and over its own completed steps), but we should move it down:

  • https://github.com/All-Hands-AI/OpenHands/pull/5458

enyst avatar Dec 07 '24 22:12 enyst

Sorry to elaborate more, what happened in the original bug is that the agent would send many actions in a row like

It looks like the action is still running, let me try again in a moment. 
<execute_ipython>
time.sleep(60)
# GitHub action checking code
<execute_ipython>

Doing this several times in a row is actually desired behavior, not an undesirable loop. But our loop detection kills it anyway.

neubig avatar Dec 07 '24 23:12 neubig

Ah! That's very interesting. I've been thinking about this before, when we were trying to split commands ourselves, here the agent is doing it.

Is the "thought" also the same? I would really love to see a log of this, or at least trajectories.

Edited to add: I believe there is some check that ignores thoughts... but it's for editing errors. 🤔 I mean, while this could happen anyway with identical thoughts, I wonder if there's anything we can do to make it extremely rare to begin with.

enyst avatar Dec 07 '24 23:12 enyst

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jan 07 '25 01:01 github-actions[bot]

Seconding this, but with model training on a laptop (with 0.20 and GPU integration, prompt below). I think a very hacky way of solving this, is to literally have small talk so that the agent won't just cause issues.


Please achieve the following:

  1. Clone the latest main AI Scientist repo (keep it separate from GPU installation aides) https://github.com/SakanaAI/AI-Scientist and note that OPENROUTER_API_KEY is sk-or-v1-[REDACTED], do not use any other API keys, including but not limited to SemanticScholar S2_API_KEY, Anthropic (Claude Sonnet-3.5) ANTHROPIC_API_KEY, OpenAI (GPT-4o) OPENAI_API_KEY, and DeepSeek (use OpenRouter and not native with DEEPSEEK_API_KEY)
  2. Successfully use nanoGPT_lite as a sanity check to see that the AI Scientist application is working using OpenAlex, see link to README document, if there are errors in downloading data files please consider adding User Agents and Headers https://raw.githubusercontent.com/SakanaAI/AI-Scientist/refs/heads/main/README.md
  3. Create a local branch of change such that Sakana AI can use openrouter/deepseek/deepseek-chat (or other available model) through OpenRouter instead of being limited to openrouter/meta-llama/llama-3.1-405b-instruct, or using the DeepSeek API to access the DeepSeek model

Note:

  1. be mindful of folder case, for example between the AI-Scientist repo and ai_scientist folder
  2. it is good to have small talk while waiting for experiments that are running in the background, check very 10 minutes

BradKML avatar Jan 15 '25 09:01 BradKML

I think a new iteration of issue is them waiting while forgetting that Python pip or Linux apt is already done (probably other package managers and other scripts too) @enyst which tool would that be an issue from?

BradKML avatar Jan 29 '25 04:01 BradKML

There were some changes around this loop detection recently. Have any of those resolved this issue?

mamoodi avatar Mar 11 '25 15:03 mamoodi

No, it hasn't, this is a different kind of issue and it probably needs a different kind of solution. Usually we have a deterministic set of checks, to stop the loop if, for example, the agent is doing exactly the same action and gets the same obs 3 times in a row, because we know LLMs sometimes fall into that and it will likely never stop on its own.

But sometimes the actions are legitimately repetitive, like when a process didn't finish yet and the agent says "let me check the output to see if it finished" multiple times. If we let it continue, when the process finishes, the agent would likely continue with other stuff, it's not necessarily in a doom loop.

Those simple checks for "the same" action can't tell the difference between these two scenarios.

I guess the solution here might look like, we take another LLM, or some model, give it the last 10 actions and obs, and ask it something like: "has this agent collapsed into sending the same response over and over again, or is this a legitimate agent history where the agent is waiting for a process to finish?"

enyst avatar Mar 11 '25 16:03 enyst

I'm running into the same issue. In my case I'm running a CI pipeline but that usually takes about 5 - 10 minutes to run. In the meantime the agent sets 30 seconds sleep and checks if the build has completed. Eventually the agent throws up the error that its in a loop and stops.

I also explicitly mention in my request that build is going to take 5+ minutes but seems the LLM's like to respond back with 30 seconds sleep time. So that hasn't help as well.

hardiksd avatar Mar 21 '25 07:03 hardiksd

Waiting for pipelines, hitting bug loops, and stuck on the same action, are like three different kinds of issues, each needing some different mitigation strategy https://github.com/All-Hands-AI/OpenHands/issues/7960 If it is something like waiting for process to be over, I think Cursor update (maybe Windsurf has this) has some weird solutions. The common one is to have a "split" window for agent to observe if the process is done yet, patiently wait for it to be over. Alternatively, they can treat the whole terminal output as context for the agent to check while working on other issues.

BradKML avatar May 20 '25 02:05 BradKML

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 19 '25 02:06 github-actions[bot]

I think this is fixed by the addition of timeout attribute. Please feel free to reopen if you see fit.

enyst avatar Jun 19 '25 14:06 enyst