OpenHands [Bug]: Rate Limit Error

Is there an existing issue for the same bug?

[x] I have checked the existing issues.

Describe the bug and reproduction steps

Currently on a Tier 2 Antropic API tier, which has a 80,000 input token per minute limit limit. So after a while I get he RateLimitError. The problem is that after 3/4 times it gives you the rate limit error it gives another Python error that does not get exception handled which causes the system to constantly locked in a "Agent is Rate Limited" state, with the only solution is to restart the instance of OpenHands.

So I think a couple things would be helpful.

The ability to set a custom rate limit on the UI side for APIs that are rate limited (with the ability to set a refresh time)
Add the ability to truncate the prompt input so that it helps against the rate limits and input token size
Fix the "Agent is Rate Limited" infinite state

OpenHands Installation

Docker command in README

OpenHands Version

0.25

Operating System

Linux

Logs, Errors, Screenshots, and Additional Context

litellm.llms.anthropic.common_utils.AnthropicError: {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/openhands/controller/agent_controller.py", line 238, in _step_with_exception_handling
    await self._step()
  File "/app/openhands/controller/agent_controller.py", line 674, in _step
    action = self.agent.step(self.state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/openhands/agenthub/codeact_agent/codeact_agent.py", line 130, in step
    response = self.llm.completion(**params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/app/openhands/llm/llm.py", line 235, in wrapper
    resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1190, in wrapper
    raise e
  File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1068, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3085, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type
    raise e
  File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 548, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

Feb 22 '25 20:02 Marius-Juston

@enyst just for my own understanding, I know Tier 1 always gets rate limited, but is Tier 2 not enough anymore? Or does it depend?

Feb 24 '25 14:02 mamoodi

+1 for bringing the rate limit setting to the UI

as a workaround, you can try with a higher retry window. wont solve the infinite hang but can reduce the likelihood of getting there.

docker run -it --rm --pull=always \ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.25-nikolaik \ -e LOG_ALL_EVENTS=true \ -e LLM_NUM_RETRIES=6 \ -e LLM_RETRY_MIN_WAIT=5 \ -e LLM_RETRY_MAX_WAIT=90 \ -e LLM_RETRY_MULTIPLIER=2 \ -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.openhands-state:/.openhands-state \ -p 3000:3000 \ --add-host host.docker.internal:host-gateway \ --name openhands-app \ docker.all-hands.dev/all-hands-ai/openhands:0.25

Feb 24 '25 16:02 erictbenson10

Thank you for the report. Yes, it depends, and it's always possible to bump into those limits. I couldn't really run an eval with multiple processes, without a lot of pain, for example, and on a tier 3 account.

Thank you Eric for the command! That's exactly right, we can tweak those options. Docs for those options are here: https://docs.all-hands.dev/modules/usage/configuration-options#retrying

I am not sure what happens with the "infinite state" when RateLimitError is hit, though, sounds like a bug, it should just display in the UI, in real time I think?, that the agent is rate limited, while the LLM continues to retry. Cc: @raymyers

Feb 24 '25 17:02 enyst

I am getting this as well with Tier 1. I found a workaround though. With the new Claude 3.7 Sonnet, it does not seem to have a Input Tokens per Minute limit, so I have been using that instead without any problems so far.

EDIT: They added a Input Tokens per Minute limit, at 20,000 for Tier 1.

Feb 25 '25 01:02 Jaspann

I'm even getting this with Tier 3 ;) - the bigger your codebase is, the more has to be send. Especially if you build features which are across your application. It feels amazing in the beginning, but wait til you start burning tokens. :)

https://github.com/manzke/rag-chat-interface build by openhands

Feb 25 '25 18:02 manzke

In the latest release, does enabling the Condenser in Advanced settings make any difference here?

Feb 26 '25 17:02 mamoodi

Looks good to be honest. I haven't hit the limit (tier 3) yet, while github copilot kills me a lot of time. I think the biggest difference right now between openhands and github copilot is the deep integration into vscode. github coplit is much better in replacing certain parts. Continuing the research with a far bigger project :)

Mar 01 '25 20:03 manzke

One workaround I have found is that once your agent gets rate limited, you can type continue in the chat window and the agent resumes. It's annoying to do that but I'm looking to see, if there is a way to automate that with browser control. So that there is an option to resume after agent gets into that state.

I run into these limits with tier 2, but also just bought enough credits to go to tier 4 to see how much it helps but my hunch is, it will get there but will probably run a lot longer before getting there.

Mar 07 '25 04:03 hardiksd

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Apr 07 '25 02:04 github-actions[bot]

This issue was closed because it has been stalled for over 30 days with no activity.

Apr 14 '25 02:04 github-actions[bot]

Can we introduce a back off policy or something to continue to the activities

Jun 06 '25 05:06 wi-ski

@wi-ski There is already, openhands should retry a number of times. You can configure how many times using LLM_NUM_RETRIES environment variable. Is something not working with it?

Jun 09 '25 18:06 enyst

What is the back off policy? To be effective for this error, the code would need to wait a minute before retrying.

Jun 27 '25 17:06 jeffskla

@jeffskla There was an issue in the default retry settings, resulting in a very short 18s wait that wasn't long enough for the per-minute limit to reset. I submitted a PR (#9489) that changes the default, but, in case you need, you can try setting the new values in the PR on your end through a config file or env vars. See the LLM configuration docs page for more details.

Jul 01 '25 22:07 llamantino

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Aug 03 '25 02:08 github-actions[bot]

Is this issue still valid?

Aug 04 '25 12:08 mamoodi

I haven't seen this since version .48.

Aug 06 '25 07:08 jeffskla

I got this today with gpt-5

litellm.llms.openai.common_utils.OpenAIError: Error code: 429 - {'error': {'message': 'Request too large for gpt-5 in organization org-Iy7CYzFimGGc4FXAEblsTmWC on tokens per min (TPM): Limit 30000, Requested 30683. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

Aug 19 '25 16:08 jakob1379

30k per minute is... low for openhands 😭 We are going to fix this eventually but it's not trivial currently. Do you think you can get a rate limit increase?

Aug 19 '25 17:08 enyst

Nope, I am not made of enough money to spend that kind on openai for them to bump it to the next tier 😅

So a rate limit for the various models would be much appreciated

Aug 19 '25 19:08 jakob1379

Unfortunately, 30k per minute means OH needs to stop at 30k for any future request in that conversation. Because sending, let's say, 35k tokens next minute, still means above "30k per minute". So you effectively cannot use that conversation anymore, unless condensing the contents.

You could limit the max_size to a lower number, with CLI mode or headless mode, in config.toml like here.

Aug 19 '25 21:08 enyst

This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.

Oct 06 '25 02:10 github-actions[bot]

This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.

Oct 16 '25 02:10 github-actions[bot]