prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Ability to customize retry status codes for client

Open jeffcarrico opened this issue 1 year ago • 3 comments

First check

  • [X] I added a descriptive title to this issue.
  • [X] I used the GitHub search to find a similar request and didn't find it.
  • [X] I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

If the client fails to connect with the API for any reason other than 429, it fails, thus failing the flow. If there is an intermittent failure of network infrastructure between client and API server a retry can usually be successful mitigation. It would be helpful to be able to retry additional status codes via a setting.

Describe the proposed behavior

Allow extras besides the 429 retry https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L260

via a setting (e.g. PREFECT_API_ADDITIONAL_RETRY_HTTP_STATUS=500,502 https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L260

Could also simplify the setting to PREFECT_API_RETRY_ANY_FAILURE=true

Example Use

prefect config set PREFECT_API_ADDITIONAL_RETRY_HTTP_STATUS=500,502 or prefect config set PREFECT_API_RETRY_ANY_FAILURE=true

Additional context

No response

jeffcarrico avatar Sep 08 '22 18:09 jeffcarrico

That implementation is specific to 429's in order to respect the rate limit back-off headers. If feasible, it'd be good to reuse standard retry handling for httpx. I'm not sure that exists though and our existing implementation does have a fallback when retry-after is not specified.

See https://github.com/encode/httpx/issues/108

We also may need to use a different retry mechanism for connections https://www.python-httpx.org/advanced/#usage_1

I'd recommend changing the prefix of these settings to PREFECT_CLIENT. For example:

  • PREFECT_CLIENT_RETRY_STATUS_CODES=500,502
  • PREFECT_CLIENT_RETRY_METHODS=POST,PATCH

We can define defaults for these in our settings and if you want to override them you must provide a comprehensive list. This allows users to opt out of retries for some codes/methods.

zanieb avatar Sep 08 '22 18:09 zanieb

@abrookins it'd be great to see this land on the roadmap as retries are important for handling service disruptions.

zanieb avatar Sep 08 '22 18:09 zanieb

it would be also useful to have a variable number of max retries rather than the hardcoded value https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L252 e.g PREFECT_CLIENT_MAX_RETRIES=5

BitTheByte avatar Sep 11 '22 18:09 BitTheByte