OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Bug]: (eval) Command execution error when retrying after rate limit error

Open ryanhoangt opened this issue 1 year ago • 2 comments

Is there an existing issue for the same bug?

  • [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
  • [X] I have checked the existing issues.

Describe the bug

When running swe-bench evaluation, there were some instances where the rate limit error occurred. After retrying, (sometimes?) the agent cannot execute commands anymore.

This seems to be because the OpenHands backend cannot communicate with the runtime client to execute actions.

Below is the log file for one instance: instance_django__django-12308.log

The error messages can be found at line: 3809, 3876

Current OpenHands version

0.9.4

Installation and Configuration

Follow the instructions on swe-bench README.

Model and Agent

  • Model: claude-3-5-sonnet@20240620
  • Agent: CoActPlannerAgent

Operating System

Linux

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

ryanhoangt avatar Sep 23 '24 17:09 ryanhoangt

Extracting here pieces from the log. 🤔 After 1 attempt to retry for RateLimitError, can't connect to /execute_action. It performs a few more steps, including successful connections to the LLM, but not the runtime.

You were running a branch, right? Can you please check, what exactly is the last commit from main that was included, that you were running with?

Log

2024-09-23 14:43:57,724 - INFO - CmdOutputObservation (source=EventSource.AGENT, exit code=1) Creating test database for alias 'default'... Destroying test database for alias 'default'... SystemCheckError: System check identified some issues: ERRORS: (admin.E403) A 'django.template.backends.django.DjangoTemplates' instance must be configured in TEMPLATES in order to use the admin application. (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application. (admin.E409) 'django.contrib.messages.middleware.MessageMiddleware' must be in MIDDLEWARE in order to use the admin application. 'django.contrib.sessions.middleware.SessionMiddleware' must be in MIDDLEWARE in order to use the admin application.

System check identified 4 issues (0 silenced).

[Python Interpreter: /opt/miniconda3/envs/testbed/bin/python] root@0658734e4e57:/workspace/django__django__3.1 # 2024-09-23 14:43:57,724 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 11 GLOBAL STEP 23 2024-09-23 14:44:02,000 - ERROR - exception_type: RateLimitError

2024-09-23 14:44:02,001 - ERROR - litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'error': {'message': 'litellm.RateLimitError: BedrockException - {"message":"Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again."}\nReceived Model Group=claude-3-5-sonnet@20240620\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range', 'type': None, 'param': None, 'code': '429'}}. Attempt #1 | You can customize retry values in the configuration. 2024-09-23 14:45:12,582 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.34 USD Input tokens: 11077 | Output tokens: 540

2024-09-23 14:45:12,583 - INFO - IPythonRunCellAction THOUGHT: I apologize for the oversight. It seems we need to add more configuration to our test settings. Let's update the test_settings.py file to address these issues: CODE: edit_file_by_replace('/workspace/django__django__3.1/tests/admin_utils/test_settings.py', ''' SECRET_KEY = 'dummy_secret_key_for_testing' INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'tests.admin_utils', ] ...

2024-09-23 14:45:12,584 - INFO - -----------------------------------Container logs:----------------------------------- |INFO: 172.17.0.1:39732 - "POST /execute_action HTTP/1.1" 200 OK


2024-09-23 14:45:12,586 - ERROR - Error during command execution: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79964bd2dc90>: Failed to establish a new connection: [Errno 111] Connection refused')) 2024-09-23 14:45:12,586 - INFO - ErrorObservation(content="Command execution failed: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79964bd2dc90>: Failed to establish a new connection: [Errno 111] Connection refused'))", observation='error') 2024-09-23 14:45:12,685 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 12 GLOBAL STEP 24 2024-09-23 14:45:20,871 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.38 USD Input tokens: 11708 | Output tokens: 389

2024-09-23 14:45:20,872 - INFO - CmdRunAction (source=EventSource.AGENT) THOUGHT: I apologize for the connection error. It seems there might be an issue with the Jupyter environment. Let's try to update the file using a different method. COMMAND: cat << EOF > /workspace/django__django__3.1/tests/admin_utils/test_settings.py SECRET_KEY = 'dummy_secret_key_for_testing' INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'tests.admin_utils', ] ... EOF

2024-09-23 14:45:20,874 - ERROR - Error during command execution: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7996359f1910>: Failed to establish a new connection: [Errno 111] Connection refused')) 2024-09-23 14:45:20,874 - INFO - ErrorObservation(content="Command execution failed: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7996359f1910>: Failed to establish a new connection: [Errno 111] Connection refused'))", observation='error') 2024-09-23 14:45:20,974 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 13 GLOBAL STEP 25 2024-09-23 14:45:38,327 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.43 USD Input tokens: 12188 | Output tokens: 548

2024-09-23 14:45:38,327 - INFO - AgentFinishAction(outputs={'content': "I apologize for the continued connection issues. It seems that we're having problems executing commands through the provided interface. Given these limitations, I'll provide a summary of what we've attempted....

enyst avatar Sep 23 '24 19:09 enyst

The latest commit from main on the branch as I can see is 01462e11d710327d6a389ce2d34c792e780ca3c4. And it's after the 0.9.4 release.

ryanhoangt avatar Sep 24 '24 03:09 ryanhoangt

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 26 '24 01:10 github-actions[bot]

Has this still happened since then, Ryan?

enyst avatar Oct 26 '24 03:10 enyst

I think no, sometimes I got retries after rate limit error and everything still worked fine. Not sure if it's addressed in some PRs before, maybe we can close this issue for now.

ryanhoangt avatar Oct 26 '24 06:10 ryanhoangt