contour
                                
                                 contour copied to clipboard
                                
                                    contour copied to clipboard
                            
                            
                            
                        Add RetryPolicy in ExtensionServiceSpec
Hello,
I'm using Contour with an ExtensionService to manage authentication. I have an average load of 500 req/s. I'm experiencing an issue where, on some occasions, the connection to the authentication service is reset/closed by Envoy, resulting in the original request failing with a PERMISSION_DENIED error. This is not ideal for downstream services.
This issue almost always coincides with an Envoy cx destroy event.
Since network connections are never 100% stable what should we do to handle such errors and avoid returning a PERMISSION_DENIED error.
I'm wondering if it would be possible to add a RetryPolicy in the ExtensionServiceSpec to handle this type of situation.
@jerome-quere that sounds reasonable, would you be interested in contributing a change here?
Unrelated to the topic at hand but great to see users diagnosing issues using Envoy's stats output, if youre interested in contributing in this area too please let us know! Here's another issue that would be great to get some user input on: https://github.com/projectcontour/contour/issues/5655
Is this supported by envoy? iiuc envoy doesnt support retries for ext-auth services https://github.com/envoyproxy/envoy/issues/17918
I think it should be possible via:
extension-envoy-filters-http-ext-authz.grpc_service.envoy_grpc.retry_policy setting
Ty for the pointer. The docs document this field as:
Indicates the retry policy for re-establishing the gRPC stream
I read this as a tcp level retry which only retries on stream establishment whereas Contour Retry policy is a more generic retry policy.
I think it might be confusing for the user to expose the entire RetryPolicy object knowing that it is supported with many asterisks.
Knowing the asterisks is this the type of retries you want?
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
- After 60d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, the Issue is closed
You can:
- Mark this Issue as fresh by commenting
- Close this Issue
- Offer to help out with triage
Please send feedback to the #contour channel in the Kubernetes Slack