envoy icon indicating copy to clipboard operation
envoy copied to clipboard

OAuth2: add retry policy to the OAuth2 filter

Open zhaohuabing opened this issue 1 year ago • 15 comments

Commit Message: Add retry policy to the OAuth2 filter to avoid failing to call the Authorization server. Additional Description:

Some EG users reported that Authorization servers may timeout TCP connections in the pool but only send an RST when Envoy reuses a connection to send a request to the Authorization server again. This behavior causes the authentication flow to fail. Implementing retries can mitigate this issue.

Risk Level: low Testing: unit test, integration test Docs Changes: API Release Notes: yes Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] Fixes: https://github.com/envoyproxy/envoy/issues/33572 [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]

Related EG issue: https://github.com/envoyproxy/gateway/issues/3178

Caveat: I'm not very experienced in c++, so some of my implementation may not adhere to the best practice of c++ or the conventions of Envoy codebase. I welcome any feedback and will consider it a valuable opportunity to learn the right way to contribute to Envoy. Thank you!

zhaohuabing avatar Jul 18 '24 14:07 zhaohuabing

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @mattklein123 CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/35262 was opened by zhaohuabing.

see: more, trace.

Hi @envoyproxy/api-shepherds, I have added a retry_policy at the top level of the OAuth2 API for the following reasons:

  • It doesn't require deprecating the existing token_endpoin and adding a new one with retry_policy in it.
  • It's future-proof, allowing us to reuse the same retry_policy for accessing other IdP endpoints, e.g, the logout endpoint.

If we agree on the API change, I'll go ahead to add the implementation.

zhaohuabing avatar Jul 19 '24 09:07 zhaohuabing

This requirement looks good to me and the jwt_authn filter, gcp_authn filter, RemoteDataSource, etc. have provided similar API.

But I think my question would be should we add the retry_policy in the HttpUri directly? Then we needn't change the API again and again in the future.

cc @htuch @mattklein123

wbpcode avatar Jul 22 '24 08:07 wbpcode

But I think my question would be should we add the retry_policy in the HttpUri directly? Then we needn't change the API again and again in the future.

Good point! The only issue is that if we add retry_policy directly to HTTPUri, it implies all the APIs referencing HTTPUri are expected to support retries. While it would great to have this feature, we need to modify all the existing implementations to support it.

zhaohuabing avatar Jul 22 '24 09:07 zhaohuabing

We can comment the current situation clearly and do the retry support for other positions in future PR on required

wbpcode avatar Jul 22 '24 15:07 wbpcode

I don't think retry policy is a property of a URI but of a service. So you could potentially add it to the recently added envoy.config.core.v3.HttpService. But in this case I don't feel that strongly.

htuch avatar Jul 23 '24 04:07 htuch

I don't think retry policy is a property of a URI but of a service. So you could potentially add it to the recently added envoy.config.core.v3.HttpService. But in this case I don't feel that strongly.

I raised this suggestion because the have add timeout in the HttpUri. I agree HttpService is better position but it is come too late and no much scenarios are using it.

cc @htuch

wbpcode avatar Jul 23 '24 05:07 wbpcode

I agree on the history. Going forward, can we deprecate the existing URI field and replace with Service, adding this there?

htuch avatar Jul 24 '24 04:07 htuch

If we could convert everything to use a Service (applies both to HttpService and GrpcService), this would've been great. However, w/o proper API deprecation mechanism this is probably not worth the effort. At least the code can be refactored to be similar.

adisuissa avatar Jul 24 '24 13:07 adisuissa

@wbpcode @htuch @adisuissa thank you for your thoughtful inputs, I appreciate it. I believe the question of whether the retry policy should be incorporated into HTTPService/GRPCService and the deprecation of HttpUri in existing APIs is a broader topic. Perhaps we could track this in a separate issue if needed?

Regarding this PR, could I proceed with the current standalone retry policy approach?

zhaohuabing avatar Jul 26 '24 03:07 zhaohuabing

Now we have no an actual API deprecation mechanism. Even a API is marked as deprecated, Envoy need to support them forever.

So, it 's hard to replace the HttpUri with HttpService. But as @adisuissa said, at least we can make the code be general.

Any way, this is a bigger topic and it's hard to get a final conclusion in short term.

Specific to this PR, I recommend that let's along the 'tranditional' way first to unblock the following works.

wbpcode avatar Jul 26 '24 03:07 wbpcode

I'm fine with retry policy. where it is in this PR as long as we don't put it in HttpUri.

htuch avatar Jul 26 '24 04:07 htuch

/lgtm api

wbpcode avatar Jul 30 '24 06:07 wbpcode

/wait Needs tests and main merge

RyanTheOptimist avatar Aug 06 '24 18:08 RyanTheOptimist

Apologies for the delay. I was on vacation last week and will be attending Kubecon next week. I plan to resume work on this PR after Kubecon.

zhaohuabing avatar Aug 19 '24 12:08 zhaohuabing