fixes #7341
- [ ] This change is worth documenting at https://docs.all-hands.dev/
- [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
End-user friendly description of the problem this fixes or functionality that this introduces.
Give a summary of what the PR does, explaining any non-trivial design decisions. Enhance check_if_alive stability
Link of any specific issues this addresses.
- fix https://github.com/All-Hands-AI/OpenHands/issues/7341
remote runtime is not docker runtime, when i use local machine to devlopment openhands project, it always throw this exp, u can see more detail in #7341
For the new version of check_if_alive
https://github.com/All-Hands-AI/OpenHands/blob/24773e15c53b0f9b34e3883248f56c2ade9b88a1/openhands/runtime/impl/action_execution/action_execution_client.py#L117-L128
you can add RequestHTTPError to _is_retryable_check_alive_error(exception)like this:
from openhands.runtime.utils.request import RequestHTTPError
def _is_retryable_check_alive_error(exception):
return isinstance(
exception, (httpx.RemoteProtocolError, httpcore.RemoteProtocolError, RequestHTTPError)
)
to fix the bug that aroused by local docker environment.
I think this may have been solved in https://github.com/All-Hands-AI/OpenHands/pull/7548/files
@SKYhuangjing can you see if that works for you?
You can try it by changing the version tag to main in the docker run instructions
@rbren I've adopted the new version of the code you mentioned, but the error openhands.runtime.utils.request.RequestHTTPError: 503 Server Error: Service Unavailable that occur in the local docker environment still can't be caught by retry_if_exception.
However, the problem was solved when I added httpx.HTTPStatusError to _is_retryable_wait_until_alive_error in openhands/runtime/impl/docker/docker_runtime.py like this:
def _is_retryable_wait_until_alive_error(exception):
if isinstance(exception, tenacity.RetryError):
cause = exception.last_attempt.exception()
return _is_retryable_wait_until_alive_error(cause)
return isinstance(
exception, (ConnectionError, httpx.NetworkError, httpx.RemoteProtocolError, httpx.HTTPStatusError)
)
OK great--want to open a PR that adds that error to the retryable errors?
I think this may have been solved in https://github.com/All-Hands-AI/OpenHands/pull/7548/files
@SKYhuangjing can you see if that works for you?
You can try it by changing the version tag to
mainin the docker run instructions
Test it, is bad, server throw 503, but retry is not catch it.
@rbren I've adopted the new version of the code you mentioned, but the error
openhands.runtime.utils.request.RequestHTTPError: 503 Server Error: Service Unavailablethat occur in the local docker environment still can't be caught byretry_if_exception.However, the problem was solved when I added
httpx.HTTPStatusErrorto_is_retryable_wait_until_alive_errorinopenhands/runtime/impl/docker/docker_runtime.pylike this:def _is_retryable_wait_until_alive_error(exception): if isinstance(exception, tenacity.RetryError): cause = exception.last_attempt.exception() return _is_retryable_wait_until_alive_error(cause) return isinstance( exception, (ConnectionError, httpx.NetworkError, httpx.RemoteProtocolError, httpx.HTTPStatusError) )
@xingyaoww @Randonee1 httpstatus is contains 4xx, I think only retry for 5xx error, 4xx maybe is program error
This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This PR was closed because it has been stalled for over 30 days with no activity.