prefect
prefect copied to clipboard
Ability to customize retry status codes for client
First check
- [X] I added a descriptive title to this issue.
- [X] I used the GitHub search to find a similar request and didn't find it.
- [X] I searched the Prefect documentation for this feature.
Prefect Version
2.x
Describe the current behavior
If the client fails to connect with the API for any reason other than 429, it fails, thus failing the flow. If there is an intermittent failure of network infrastructure between client and API server a retry can usually be successful mitigation. It would be helpful to be able to retry additional status codes via a setting.
Describe the proposed behavior
Allow extras besides the 429 retry https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L260
via a setting (e.g. PREFECT_API_ADDITIONAL_RETRY_HTTP_STATUS=500,502
https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L260
Could also simplify the setting to PREFECT_API_RETRY_ANY_FAILURE=true
Example Use
prefect config set PREFECT_API_ADDITIONAL_RETRY_HTTP_STATUS=500,502
or
prefect config set PREFECT_API_RETRY_ANY_FAILURE=true
Additional context
No response
That implementation is specific to 429's in order to respect the rate limit back-off headers. If feasible, it'd be good to reuse standard retry handling for httpx. I'm not sure that exists though and our existing implementation does have a fallback when retry-after is not specified.
See https://github.com/encode/httpx/issues/108
We also may need to use a different retry mechanism for connections https://www.python-httpx.org/advanced/#usage_1
I'd recommend changing the prefix of these settings to PREFECT_CLIENT
. For example:
-
PREFECT_CLIENT_RETRY_STATUS_CODES=500,502
-
PREFECT_CLIENT_RETRY_METHODS=POST,PATCH
We can define defaults for these in our settings and if you want to override them you must provide a comprehensive list. This allows users to opt out of retries for some codes/methods.
@abrookins it'd be great to see this land on the roadmap as retries are important for handling service disruptions.
it would be also useful to have a variable number of max retries rather than the hardcoded value https://github.com/PrefectHQ/prefect/blob/main/src/prefect/client.py#L252 e.g PREFECT_CLIENT_MAX_RETRIES=5