envoy
envoy copied to clipboard
OAuth2: add retry policy to the OAuth2 filter
Commit Message: Add retry policy to the OAuth2 filter to avoid failing to call the Authorization server. Additional Description:
Some EG users reported that Authorization servers may timeout TCP connections in the pool but only send an RST when Envoy reuses a connection to send a request to the Authorization server again. This behavior causes the authentication flow to fail. Implementing retries can mitigate this issue.
Risk Level: low Testing: unit test, integration test Docs Changes: API Release Notes: yes Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] Fixes: https://github.com/envoyproxy/envoy/issues/33572 [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]
Related EG issue: https://github.com/envoyproxy/gateway/issues/3178
Caveat: I'm not very experienced in c++, so some of my implementation may not adhere to the best practice of c++ or the conventions of Envoy codebase. I welcome any feedback and will consider it a valuable opportunity to learn the right way to contribute to Envoy. Thank you!
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
Hi @envoyproxy/api-shepherds, I have added a retry_policy at the top level of the OAuth2 API for the following reasons:
- It doesn't require deprecating the existing
token_endpoinand adding a new one with retry_policy in it. - It's future-proof, allowing us to reuse the same
retry_policyfor accessing other IdP endpoints, e.g, the logout endpoint.
If we agree on the API change, I'll go ahead to add the implementation.
This requirement looks good to me and the jwt_authn filter, gcp_authn filter, RemoteDataSource, etc. have provided similar API.
But I think my question would be should we add the retry_policy in the HttpUri directly? Then we needn't change the API again and again in the future.
cc @htuch @mattklein123
But I think my question would be should we add the retry_policy in the HttpUri directly? Then we needn't change the API again and again in the future.
Good point! The only issue is that if we add retry_policy directly to HTTPUri, it implies all the APIs referencing HTTPUri are expected to support retries. While it would great to have this feature, we need to modify all the existing implementations to support it.
We can comment the current situation clearly and do the retry support for other positions in future PR on required
I don't think retry policy is a property of a URI but of a service. So you could potentially add it to the recently added envoy.config.core.v3.HttpService. But in this case I don't feel that strongly.
I don't think retry policy is a property of a URI but of a service. So you could potentially add it to the recently added
envoy.config.core.v3.HttpService. But in this case I don't feel that strongly.
I raised this suggestion because the have add timeout in the HttpUri. I agree HttpService is better position but it is come too late and no much scenarios are using it.
cc @htuch
I agree on the history. Going forward, can we deprecate the existing URI field and replace with Service, adding this there?
If we could convert everything to use a Service (applies both to HttpService and GrpcService), this would've been great. However, w/o proper API deprecation mechanism this is probably not worth the effort. At least the code can be refactored to be similar.
@wbpcode @htuch @adisuissa thank you for your thoughtful inputs, I appreciate it. I believe the question of whether the retry policy should be incorporated into HTTPService/GRPCService and the deprecation of HttpUri in existing APIs is a broader topic. Perhaps we could track this in a separate issue if needed?
Regarding this PR, could I proceed with the current standalone retry policy approach?
Now we have no an actual API deprecation mechanism. Even a API is marked as deprecated, Envoy need to support them forever.
So, it 's hard to replace the HttpUri with HttpService. But as @adisuissa said, at least we can make the code be general.
Any way, this is a bigger topic and it's hard to get a final conclusion in short term.
Specific to this PR, I recommend that let's along the 'tranditional' way first to unblock the following works.
I'm fine with retry policy. where it is in this PR as long as we don't put it in HttpUri.
/lgtm api
/wait Needs tests and main merge
Apologies for the delay. I was on vacation last week and will be attending Kubecon next week. I plan to resume work on this PR after Kubecon.