[Bug]: Rate Limit Error
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
Describe the bug and reproduction steps
Currently on a Tier 2 Antropic API tier, which has a 80,000 input token per minute limit limit. So after a while I get he RateLimitError. The problem is that after 3/4 times it gives you the rate limit error it gives another Python error that does not get exception handled which causes the system to constantly locked in a "Agent is Rate Limited" state, with the only solution is to restart the instance of OpenHands.
So I think a couple things would be helpful.
- The ability to set a custom rate limit on the UI side for APIs that are rate limited (with the ability to set a refresh time)
- Add the ability to truncate the prompt input so that it helps against the rate limits and input token size
- Fix the "Agent is Rate Limited" infinite state
OpenHands Installation
Docker command in README
OpenHands Version
0.25
Operating System
Linux
Logs, Errors, Screenshots, and Additional Context
litellm.llms.anthropic.common_utils.AnthropicError: {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/openhands/controller/agent_controller.py", line 238, in _step_with_exception_handling
await self._step()
File "/app/openhands/controller/agent_controller.py", line 674, in _step
action = self.agent.step(self.state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/openhands/agenthub/codeact_agent/codeact_agent.py", line 130, in step
response = self.llm.completion(**params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
return copy(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/app/openhands/llm/llm.py", line 235, in wrapper
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1190, in wrapper
raise e
File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1068, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3085, in completion
raise exception_type(
^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type
raise e
File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 548, in exception_type
raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
@enyst just for my own understanding, I know Tier 1 always gets rate limited, but is Tier 2 not enough anymore? Or does it depend?
+1 for bringing the rate limit setting to the UI
as a workaround, you can try with a higher retry window. wont solve the infinite hang but can reduce the likelihood of getting there.
docker run -it --rm --pull=always \ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.25-nikolaik \ -e LOG_ALL_EVENTS=true \ -e LLM_NUM_RETRIES=6 \ -e LLM_RETRY_MIN_WAIT=5 \ -e LLM_RETRY_MAX_WAIT=90 \ -e LLM_RETRY_MULTIPLIER=2 \ -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.openhands-state:/.openhands-state \ -p 3000:3000 \ --add-host host.docker.internal:host-gateway \ --name openhands-app \ docker.all-hands.dev/all-hands-ai/openhands:0.25
Thank you for the report. Yes, it depends, and it's always possible to bump into those limits. I couldn't really run an eval with multiple processes, without a lot of pain, for example, and on a tier 3 account.
Thank you Eric for the command! That's exactly right, we can tweak those options. Docs for those options are here: https://docs.all-hands.dev/modules/usage/configuration-options#retrying
I am not sure what happens with the "infinite state" when RateLimitError is hit, though, sounds like a bug, it should just display in the UI, in real time I think?, that the agent is rate limited, while the LLM continues to retry. Cc: @raymyers
I am getting this as well with Tier 1. I found a workaround though. With the new Claude 3.7 Sonnet, it does not seem to have a Input Tokens per Minute limit, so I have been using that instead without any problems so far.
EDIT: They added a Input Tokens per Minute limit, at 20,000 for Tier 1.
I'm even getting this with Tier 3 ;) - the bigger your codebase is, the more has to be send. Especially if you build features which are across your application. It feels amazing in the beginning, but wait til you start burning tokens. :)
https://github.com/manzke/rag-chat-interface build by openhands
In the latest release, does enabling the Condenser in Advanced settings make any difference here?
Looks good to be honest. I haven't hit the limit (tier 3) yet, while github copilot kills me a lot of time. I think the biggest difference right now between openhands and github copilot is the deep integration into vscode. github coplit is much better in replacing certain parts. Continuing the research with a far bigger project :)
One workaround I have found is that once your agent gets rate limited, you can type continue in the chat window and the agent resumes. It's annoying to do that but I'm looking to see, if there is a way to automate that with browser control. So that there is an option to resume after agent gets into that state.
I run into these limits with tier 2, but also just bought enough credits to go to tier 4 to see how much it helps but my hunch is, it will get there but will probably run a lot longer before getting there.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.
Can we introduce a back off policy or something to continue to the activities
@wi-ski There is already, openhands should retry a number of times. You can configure how many times using LLM_NUM_RETRIES environment variable. Is something not working with it?
What is the back off policy? To be effective for this error, the code would need to wait a minute before retrying.
@jeffskla There was an issue in the default retry settings, resulting in a very short 18s wait that wasn't long enough for the per-minute limit to reset. I submitted a PR (#9489) that changes the default, but, in case you need, you can try setting the new values in the PR on your end through a config file or env vars. See the LLM configuration docs page for more details.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Is this issue still valid?
I haven't seen this since version .48.
I got this today with gpt-5
litellm.llms.openai.common_utils.OpenAIError: Error code: 429 - {'error': {'message': 'Request too large for gpt-5 in organization org-Iy7CYzFimGGc4FXAEblsTmWC on tokens per min (TPM): Limit 30000, Requested 30683. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
30k per minute is... low for openhands 😠We are going to fix this eventually but it's not trivial currently. Do you think you can get a rate limit increase?
Nope, I am not made of enough money to spend that kind on openai for them to bump it to the next tier 😅
So a rate limit for the various models would be much appreciated
Unfortunately, 30k per minute means OH needs to stop at 30k for any future request in that conversation. Because sending, let's say, 35k tokens next minute, still means above "30k per minute". So you effectively cannot use that conversation anymore, unless condensing the contents.
You could limit the max_size to a lower number, with CLI mode or headless mode, in config.toml like here.
This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.
This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.