OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

fixes #7341

Open SKYhuangjing opened this issue 9 months ago • 7 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality that this introduces.


Give a summary of what the PR does, explaining any non-trivial design decisions. Enhance check_if_alive stability


Link of any specific issues this addresses.

  • fix https://github.com/All-Hands-AI/OpenHands/issues/7341

SKYhuangjing avatar Mar 20 '25 09:03 SKYhuangjing

remote runtime is not docker runtime, when i use local machine to devlopment openhands project, it always throw this exp, u can see more detail in #7341

SKYhuangjing avatar Mar 21 '25 00:03 SKYhuangjing

For the new version of check_if_alive https://github.com/All-Hands-AI/OpenHands/blob/24773e15c53b0f9b34e3883248f56c2ade9b88a1/openhands/runtime/impl/action_execution/action_execution_client.py#L117-L128 you can add RequestHTTPError to _is_retryable_check_alive_error(exception)like this:

from openhands.runtime.utils.request import RequestHTTPError
def _is_retryable_check_alive_error(exception):
    return isinstance(
        exception, (httpx.RemoteProtocolError, httpcore.RemoteProtocolError, RequestHTTPError)
    )

to fix the bug that aroused by local docker environment.

Randonee1 avatar Mar 27 '25 15:03 Randonee1

I think this may have been solved in https://github.com/All-Hands-AI/OpenHands/pull/7548/files

@SKYhuangjing can you see if that works for you?

You can try it by changing the version tag to main in the docker run instructions

rbren avatar Mar 28 '25 17:03 rbren

@rbren I've adopted the new version of the code you mentioned, but the error openhands.runtime.utils.request.RequestHTTPError: 503 Server Error: Service Unavailable that occur in the local docker environment still can't be caught by retry_if_exception.

However, the problem was solved when I added httpx.HTTPStatusError to _is_retryable_wait_until_alive_error in openhands/runtime/impl/docker/docker_runtime.py like this:

def _is_retryable_wait_until_alive_error(exception):
    if isinstance(exception, tenacity.RetryError):
        cause = exception.last_attempt.exception()
        return _is_retryable_wait_until_alive_error(cause)

    return isinstance(
        exception, (ConnectionError, httpx.NetworkError, httpx.RemoteProtocolError, httpx.HTTPStatusError)
    )

Randonee1 avatar Apr 01 '25 07:04 Randonee1

OK great--want to open a PR that adds that error to the retryable errors?

rbren avatar Apr 01 '25 16:04 rbren

I think this may have been solved in https://github.com/All-Hands-AI/OpenHands/pull/7548/files

@SKYhuangjing can you see if that works for you?

You can try it by changing the version tag to main in the docker run instructions

Test it, is bad, server throw 503, but retry is not catch it.

SKYhuangjing avatar Apr 02 '25 09:04 SKYhuangjing

@rbren I've adopted the new version of the code you mentioned, but the error openhands.runtime.utils.request.RequestHTTPError: 503 Server Error: Service Unavailable that occur in the local docker environment still can't be caught by retry_if_exception.

However, the problem was solved when I added httpx.HTTPStatusError to _is_retryable_wait_until_alive_error in openhands/runtime/impl/docker/docker_runtime.py like this:

def _is_retryable_wait_until_alive_error(exception):
    if isinstance(exception, tenacity.RetryError):
        cause = exception.last_attempt.exception()
        return _is_retryable_wait_until_alive_error(cause)

    return isinstance(
        exception, (ConnectionError, httpx.NetworkError, httpx.RemoteProtocolError, httpx.HTTPStatusError)
    )

@xingyaoww @Randonee1 httpstatus is contains 4xx, I think only retry for 5xx error, 4xx maybe is program error

SKYhuangjing avatar Apr 02 '25 09:04 SKYhuangjing

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 03 '25 02:05 github-actions[bot]

This PR was closed because it has been stalled for over 30 days with no activity.

github-actions[bot] avatar May 11 '25 02:05 github-actions[bot]