[Bug]: Loop detection kills agents that are waiting on long-running processes
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Describe the bug and reproduction steps
We have loop detection in OpenHands to prevent agents from getting stuck in a loop and waiting forever. However a confluence of two things have made loop detection potentially harmful:
- Agents based on claude-sonnet-3.5 new are much better and rarely get stuck in loops
- We have a 2-minute timeout on agents, which means that they need to repeatedly sleep while waiting for long-running processes
As a result, recently I have seen multiple instances of agents waiting on processes to finish, but getting killed with "agent got stuck in a loop" when they waited to long. I don't know conceptually the best solution to this, but wanted to flag it as an issue
OpenHands Installation
Docker command in README
OpenHands Version
No response
Operating System
None
Logs, Errors, Screenshots, and Additional Context
No response
Do we have an example log from this? Sorry, off-hand I don't see why waiting would do that... 🤔
To elaborate a bit why this confuses me:
- the loop detection works on history, actions and obs that have been completed and are in history; it doesn't check things during the creation of an observation
- the loop detection check should be attempted once per step, regardless how long the step took...
And as I was writing this, checking the code, it seems the last one isn't quite true anymore: when we moved the check from the end of the step to the beginning of the step, it ended up before we check if there's a delegate in progress... which doesn't seem the correct thing to do. It means the parent agent will check endlessly the same thing (the last steps of its own history), while waiting for a delegate.
Was this seen with a delegate?
I'm still not sure why it would trigger the stop, because the first remark above should still apply (it spins over and over its own completed steps), but we should move it down:
- https://github.com/All-Hands-AI/OpenHands/pull/5458
Sorry to elaborate more, what happened in the original bug is that the agent would send many actions in a row like
It looks like the action is still running, let me try again in a moment.
<execute_ipython>
time.sleep(60)
# GitHub action checking code
<execute_ipython>
Doing this several times in a row is actually desired behavior, not an undesirable loop. But our loop detection kills it anyway.
Ah! That's very interesting. I've been thinking about this before, when we were trying to split commands ourselves, here the agent is doing it.
Is the "thought" also the same? I would really love to see a log of this, or at least trajectories.
Edited to add: I believe there is some check that ignores thoughts... but it's for editing errors. 🤔 I mean, while this could happen anyway with identical thoughts, I wonder if there's anything we can do to make it extremely rare to begin with.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Seconding this, but with model training on a laptop (with 0.20 and GPU integration, prompt below). I think a very hacky way of solving this, is to literally have small talk so that the agent won't just cause issues.
Please achieve the following:
- Clone the latest main AI Scientist repo (keep it separate from GPU installation aides) https://github.com/SakanaAI/AI-Scientist and note that
OPENROUTER_API_KEYissk-or-v1-[REDACTED], do not use any other API keys, including but not limited to SemanticScholarS2_API_KEY, Anthropic (Claude Sonnet-3.5)ANTHROPIC_API_KEY, OpenAI (GPT-4o)OPENAI_API_KEY, and DeepSeek (use OpenRouter and not native withDEEPSEEK_API_KEY) - Successfully use nanoGPT_lite as a sanity check to see that the AI Scientist application is working using OpenAlex, see link to README document, if there are errors in downloading data files please consider adding User Agents and Headers https://raw.githubusercontent.com/SakanaAI/AI-Scientist/refs/heads/main/README.md
- Create a local branch of change such that Sakana AI can use
openrouter/deepseek/deepseek-chat(or other available model) through OpenRouter instead of being limited toopenrouter/meta-llama/llama-3.1-405b-instruct, or using the DeepSeek API to access the DeepSeek model
Note:
- be mindful of folder case, for example between the
AI-Scientistrepo andai_scientistfolder - it is good to have small talk while waiting for experiments that are running in the background, check very 10 minutes
I think a new iteration of issue is them waiting while forgetting that Python pip or Linux apt is already done (probably other package managers and other scripts too) @enyst which tool would that be an issue from?
There were some changes around this loop detection recently. Have any of those resolved this issue?
No, it hasn't, this is a different kind of issue and it probably needs a different kind of solution. Usually we have a deterministic set of checks, to stop the loop if, for example, the agent is doing exactly the same action and gets the same obs 3 times in a row, because we know LLMs sometimes fall into that and it will likely never stop on its own.
But sometimes the actions are legitimately repetitive, like when a process didn't finish yet and the agent says "let me check the output to see if it finished" multiple times. If we let it continue, when the process finishes, the agent would likely continue with other stuff, it's not necessarily in a doom loop.
Those simple checks for "the same" action can't tell the difference between these two scenarios.
I guess the solution here might look like, we take another LLM, or some model, give it the last 10 actions and obs, and ask it something like: "has this agent collapsed into sending the same response over and over again, or is this a legitimate agent history where the agent is waiting for a process to finish?"
I'm running into the same issue. In my case I'm running a CI pipeline but that usually takes about 5 - 10 minutes to run. In the meantime the agent sets 30 seconds sleep and checks if the build has completed. Eventually the agent throws up the error that its in a loop and stops.
I also explicitly mention in my request that build is going to take 5+ minutes but seems the LLM's like to respond back with 30 seconds sleep time. So that hasn't help as well.
Waiting for pipelines, hitting bug loops, and stuck on the same action, are like three different kinds of issues, each needing some different mitigation strategy https://github.com/All-Hands-AI/OpenHands/issues/7960 If it is something like waiting for process to be over, I think Cursor update (maybe Windsurf has this) has some weird solutions. The common one is to have a "split" window for agent to observe if the process is done yet, patiently wait for it to be over. Alternatively, they can treat the whole terminal output as context for the agent to check while working on other issues.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
I think this is fixed by the addition of timeout attribute. Please feel free to reopen if you see fit.