envoy icon indicating copy to clipboard operation
envoy copied to clipboard

relax backing cluster check for sds

Open ramaraochavali opened this issue 1 year ago • 3 comments

Checking to see what CI thinks Fixes https://github.com/envoyproxy/envoy/issues/12954 Commit Message: Additional Description: Risk Level: Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]

ramaraochavali avatar Oct 18 '24 10:10 ramaraochavali

As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/36694 was opened by ramaraochavali.

see: more, trace.

/retest

ramaraochavali avatar Oct 18 '24 14:10 ramaraochavali

would there be cases where SDS is used, and depends on some cluster that isn't ready (and is not checked now), and will cause a cluster/listener to not have its secret?

If cluster/listener not able to get secrets because of SDS is not ready, the requests would fail and downstream_context_secrets_not_ready stat is incremented.

BTW, The behaviour is same - even with static cluster (defined in bootstrap) with static load assignments with a non existing server listening to that endpoint and port. So there is no change in behaviour for dynamic clusters

ramaraochavali avatar Oct 19 '24 08:10 ramaraochavali

If cluster/listener not able to get secrets because of SDS is not ready, the requests would fail and downstream_context_secrets_not_ready stat is incremented.

BTW, The behaviour is same - even with static cluster (defined in bootstrap) with static load assignments with a non existing server listening to that endpoint and port. So there is no change in behaviour for dynamic clusters

Thanks! So I'm assuming this will mainly impact the initialization of Envoy, i.e., up until now Envoy would wait for the static clusters, and the dynamic secrets to arrive before initialization, whereas now it will finish initialization without waiting (or probably failing) to receive the secrets. Is that correct?

adisuissa avatar Oct 22 '24 20:10 adisuissa

So I'm assuming this will mainly impact the initialization of Envoy, i.e., up until now Envoy would wait for the static clusters, and the dynamic secrets to arrive before initialization, whereas now it will finish initialization without waiting (or probably failing) to receive the secrets. Is that correct?

Not really. The current initialization sequence is

  • It initializes the static cluster
  • It initializes the listener/cluster that referred SDS. It throws a validation if the SDS apiconfig source refers to a non static/non EDS cluster. If secret is provided it considers. If the SDS server is unavailable or not send secret, it finishes initialization but puts the secret in warming state. if a request comes to the listener if fails with downstream_context_secrets_not_ready stat.

After this change

  • It will accept any cluster - dynamic/eds
  • It initializes the listener/cluster that referred SDS. It does not do any validation. . If cluster referred in SDS is not available/or it does not respond, the initialization is finished by putting the secret in warming state if a request comes to the listener if fails with downstream_context_secrets_not_ready stat.

So there is absolutely no change in the behaviour either case - except the removal of validation.

ramaraochavali avatar Oct 23 '24 05:10 ramaraochavali

Ok, thanks. I suggest going forward with this, and maybe guard it with a restart-feature-flag.

adisuissa avatar Oct 23 '24 14:10 adisuissa

Ok. Thank you. I will make necessary change and update tests.

ramaraochavali avatar Oct 23 '24 15:10 ramaraochavali

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/36694 was synchronize by ramaraochavali.

see: more, trace.

/retest

ramaraochavali avatar Oct 27 '24 09:10 ramaraochavali

@adisuissa This is ready for review. Can you PTAL?

ramaraochavali avatar Oct 27 '24 09:10 ramaraochavali

@envoyproxy/senior-maintainers assignee is @ggreenway

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/36694#pullrequestreview-2399075852 was submitted by @adisuissa.

see: more, trace.

/retest

ramaraochavali avatar Oct 29 '24 16:10 ramaraochavali

/retest

ramaraochavali avatar Oct 30 '24 03:10 ramaraochavali

/retest

ramaraochavali avatar Oct 30 '24 03:10 ramaraochavali

/retest

ramaraochavali avatar Oct 30 '24 05:10 ramaraochavali

Please also add a test that uses a bootstrap-defined EDS cluster, as that is also explicitly allowed by this change.

Added. PTAL

ramaraochavali avatar Oct 31 '24 10:10 ramaraochavali

/retest

ramaraochavali avatar Nov 01 '24 03:11 ramaraochavali

@ggreenway PTAL when you get chance?

ramaraochavali avatar Nov 01 '24 03:11 ramaraochavali