[Bug]: (eval) Command execution error when retrying after rate limit error
Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
- [X] I have checked the existing issues.
Describe the bug
When running swe-bench evaluation, there were some instances where the rate limit error occurred. After retrying, (sometimes?) the agent cannot execute commands anymore.
This seems to be because the OpenHands backend cannot communicate with the runtime client to execute actions.
Below is the log file for one instance: instance_django__django-12308.log
The error messages can be found at line: 3809, 3876
Current OpenHands version
0.9.4
Installation and Configuration
Follow the instructions on swe-bench README.
Model and Agent
- Model: claude-3-5-sonnet@20240620
- Agent: CoActPlannerAgent
Operating System
Linux
Reproduction Steps
No response
Logs, Errors, Screenshots, and Additional Context
No response
Extracting here pieces from the log. 🤔 After 1 attempt to retry for RateLimitError, can't connect to /execute_action. It performs a few more steps, including successful connections to the LLM, but not the runtime.
You were running a branch, right? Can you please check, what exactly is the last commit from main that was included, that you were running with?
Log
2024-09-23 14:43:57,724 - INFO - CmdOutputObservation (source=EventSource.AGENT, exit code=1) Creating test database for alias 'default'... Destroying test database for alias 'default'... SystemCheckError: System check identified some issues: ERRORS: (admin.E403) A 'django.template.backends.django.DjangoTemplates' instance must be configured in TEMPLATES in order to use the admin application. (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application. (admin.E409) 'django.contrib.messages.middleware.MessageMiddleware' must be in MIDDLEWARE in order to use the admin application. 'django.contrib.sessions.middleware.SessionMiddleware' must be in MIDDLEWARE in order to use the admin application.
System check identified 4 issues (0 silenced).
[Python Interpreter: /opt/miniconda3/envs/testbed/bin/python] root@0658734e4e57:/workspace/django__django__3.1 # 2024-09-23 14:43:57,724 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 11 GLOBAL STEP 23 2024-09-23 14:44:02,000 - ERROR - exception_type: RateLimitError
2024-09-23 14:44:02,001 - ERROR - litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'error': {'message': 'litellm.RateLimitError: BedrockException - {"message":"Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again."}\nReceived Model Group=claude-3-5-sonnet@20240620\nAvailable Model Group Fallbacks=None\nError doing the fallback: list index out of range', 'type': None, 'param': None, 'code': '429'}}. Attempt #1 | You can customize retry values in the configuration. 2024-09-23 14:45:12,582 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.34 USD Input tokens: 11077 | Output tokens: 540
2024-09-23 14:45:12,583 - INFO - IPythonRunCellAction THOUGHT: I apologize for the oversight. It seems we need to add more configuration to our test settings. Let's update the
test_settings.pyfile to address these issues: CODE: edit_file_by_replace('/workspace/django__django__3.1/tests/admin_utils/test_settings.py', ''' SECRET_KEY = 'dummy_secret_key_for_testing' INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'tests.admin_utils', ] ...
2024-09-23 14:45:12,584 - INFO - -----------------------------------Container logs:----------------------------------- |INFO: 172.17.0.1:39732 - "POST /execute_action HTTP/1.1" 200 OK
2024-09-23 14:45:12,586 - ERROR - Error during command execution: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79964bd2dc90>: Failed to establish a new connection: [Errno 111] Connection refused')) 2024-09-23 14:45:12,586 - INFO - ErrorObservation(content="Command execution failed: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79964bd2dc90>: Failed to establish a new connection: [Errno 111] Connection refused'))", observation='error') 2024-09-23 14:45:12,685 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 12 GLOBAL STEP 24 2024-09-23 14:45:20,871 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.38 USD Input tokens: 11708 | Output tokens: 389
2024-09-23 14:45:20,872 - INFO - CmdRunAction (source=EventSource.AGENT) THOUGHT: I apologize for the connection error. It seems there might be an issue with the Jupyter environment. Let's try to update the file using a different method. COMMAND: cat << EOF > /workspace/django__django__3.1/tests/admin_utils/test_settings.py SECRET_KEY = 'dummy_secret_key_for_testing' INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'tests.admin_utils', ] ... EOF
2024-09-23 14:45:20,874 - ERROR - Error during command execution: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7996359f1910>: Failed to establish a new connection: [Errno 111] Connection refused')) 2024-09-23 14:45:20,874 - INFO - ErrorObservation(content="Command execution failed: HTTPConnectionPool(host='localhost', port=39138): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7996359f1910>: Failed to establish a new connection: [Errno 111] Connection refused'))", observation='error') 2024-09-23 14:45:20,974 - INFO - LocalExecutorAgent LEVEL 1 LOCAL STEP 13 GLOBAL STEP 25 2024-09-23 14:45:38,327 - INFO - Cost: 0.04 USD | Accumulated Cost: 0.43 USD Input tokens: 12188 | Output tokens: 548
2024-09-23 14:45:38,327 - INFO - AgentFinishAction(outputs={'content': "I apologize for the continued connection issues. It seems that we're having problems executing commands through the provided interface. Given these limitations, I'll provide a summary of what we've attempted....
The latest commit from main on the branch as I can see is 01462e11d710327d6a389ce2d34c792e780ca3c4. And it's after the 0.9.4 release.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Has this still happened since then, Ryan?
I think no, sometimes I got retries after rate limit error and everything still worked fine. Not sure if it's addressed in some PRs before, maybe we can close this issue for now.