gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Support cluster settings for non-xRoute BackendRefs

Open guydc opened this issue 1 year ago • 3 comments
trafficstars

Description: Currently, Envoy Gateway supports a variety of envoy cluster settings through the BackendTrafficPolicy resource, such as:

  • TCP Keep Alive
  • Timeouts
  • Health Checks
  • Load Balancing algorithm
  • Circuit Breakers

Envoy Gateway is adopting the use of BackendRef to represent external services in various resources, such as SecurityPolicy (ExtAuth), EnvoyExtensionPolicy (ExtProc), EnvoyProxy (OTEL sinks). With this change, reuse of existing backend-traffic policies that attach to Services, such as BackendTLSPolicy, is now possible for these backends. However, it's not possible to attach a BackendTrafficPolicy to Services. As a result, the above-mentioned cluster settings are not configurable for non-xRoute referenced backends.

Options:

  1. Split BackendTrafficPolicy to two policies. One policy will contain all fields that have an affinity to Envoy listeners (retries, rate limits, Fault... ) and will attach to Gateway and xRoutes, while the other will contain all fields that have affinity to Envoy Clusters and will attach to Services.
  2. Support attachment of BackendTrafficPolicy to Services.
  3. Create an extended BackendRef resource that supports configuration of relevant cluster settings, similar to support for weights in Gateway-API: https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.BackendRef

guydc avatar Mar 31 '24 19:03 guydc

+1 for 3.

arkodg avatar Apr 01 '24 07:04 arkodg

+1 for 3.

davidalger avatar Apr 02 '24 22:04 davidalger

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar May 10 '24 20:05 github-actions[bot]

/assign

alexwo avatar Jun 13 '24 13:06 alexwo

rethinking this one, we should also discuss option 4 - adding cluster settings into the Backend API . This could eliminate duplication of cluster settings if multiple Policies are targeting the same backend.

arkodg avatar Jul 10 '24 00:07 arkodg

Looking at all of the suggested implementation options, I think none of them will work.

The problem is that most (if not all) of the knobs available in BackendTrafficPolicy actually are attached to an Envoy Cluster definition, while the Backend resource and the BackendTargetRef arrays refer specifically to endpoints inside the Cluster definition.

  • Split BackendTrafficPolicy to two policies. One policy will contain all fields that have an affinity to Envoy listeners (retries, rate limits, Fault... ) and will attach to Gateway and xRoutes, while the other will contain all fields that have affinity to Envoy Clusters and will attach to Services.
  • Support attachment of BackendTrafficPolicy to Services.

If BackendTrafficPolicy (or the split part of that policy) attaches to a service, and the container that references the service has multiple services in it (because it's a backendTargetRef array), then it's not possible to decide which of the attached BackendTrafficPolicy resources is relevant for the entire cluster.

  • Create an extended BackendRef resource that supports configuration of relevant cluster settings, similar to support for weights in Gateway-AP

These configurations should apply to an Envoy Cluster, and a BackendRef is translated to endpoints in the Envoy Cluster, not to a Cluster. This means that it's not possible to disambiguate the configuration correctly.

  • ... adding cluster settings into the Backend API

Backend represents an endpoint in a cluster, not a cluster. Adding the cluster settings into the Backend API leads to confusion when building up the Cluster definition if there are differences between the settings attached to each individual BackendRef.

I think what is needed is to create an additional CRD that represents a Cluster that can be used for internal routes. Something like an EGConfigRoute or InternalRoute that can be directly targetted with a BackendTrafficPolicy resource. This would actually represent an Envoy Cluster (i.e. a container of associated backendRefs), and be an internal type of xRoute.

liorokman avatar Jul 15 '24 12:07 liorokman

In the last community meeting, the option to use weighted clusters ~~(clusters within clusters)~~ was raised as a means of translating "cluster" settings derived from attributes found in backendRef.

guydc avatar Jul 17 '24 11:07 guydc

I'm not sure I follow. Could you show an example?

liorokman avatar Jul 17 '24 11:07 liorokman

cc @arkodg

Looking at the envoy config, it seems that weighted clusters are a route feature: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto.html#config-route-v3-weightedcluster. I misunderstood it to be a cluster feature.

For example, here: https://github.com/envoyproxy/gateway/blob/9eff541254221963dcacadb02e18b6ac7cb53c73/internal/xds/translator/testdata/out/xds-ir/http-route-weighted-backend-with-filters.routes.yaml#L31

So, in the absence of an envoy route (which is the case here), this approach will not work. So, I think that we're back to considering a new container as the only viable option...

guydc avatar Jul 17 '24 11:07 guydc

Notes from last community meeting:

  • Weighted Cluster will indeed not work here
  • Prefer to avoid breaking and complicating the yaml API
  • Prefer to align with the GW-API approach of having backend/endpoint-specific attributes as siblings attributes to backend (e.g. weights)

Proposal:

  • Introduce a new container, e.g. "BackendsConfig" that contains cluster-level settings (LB, CircuitBreakers, ... )
  • Reuse BackendsConfig in BackendTrafficPolicy
  • BackendsConfig will be a sibling attribute of []BackendRef in all relevant locations.
  • Consider a wrapper container containing both BackendsConfig and []BackendRef that would be embedded in all relevant locations (xPolicy, EnvoyProxy) where non-xRoute clusters are needed, in backwards compatible fashion.

guydc avatar Jul 23 '24 18:07 guydc

Still need to add support for cluster-level settings for the tracing and for accesslog definitions.

liorokman avatar Aug 07 '24 06:08 liorokman