gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Revisit Kubernetes Services mapping to Envoy v3.Cluster

Open dprotaso opened this issue 10 months ago • 13 comments

Description: I was surprised to see that Kubernetes services weren't mapped to envoy v3.Clusters. One thing I was hoping for was to create high level Gateway Policy resources that would then reconcile EnvoyPatchPolicy. I prefer not to build out an entire extension service.

I wanted my policy to target k8s services and then creating a patch could modify the v3.Cluster

Unfortunately, it seems like Gateway HTTPRoute backends maps to envoy v3.Clusters (which has it's own side effects https://github.com/envoyproxy/gateway/issues/5230).

dprotaso avatar Feb 18 '25 17:02 dprotaso

In general, I agree that in some cases it could be beneficial to map Backends/Services to a single cluster. For example, users may want to apply limits such as circuit breakers on a per-proxy-per-backend basis, and not have these limits duplicated per-route-backendref. Other issues include excessive active health checking from duplication of clusters and their HC settings, inconsistent passive health check status.

guydc avatar Feb 18 '25 20:02 guydc

agree with @guydc's points around setting per backend values. The reason to keep fields such as circuitBreaker and healthCheck in the BackendTrafficPolicy instead of a dedicated policy like BackendPolicy was to prioritize usability and reduce the number of place to author intent. We should revisit this design decision based off more user feedback

There are limitations with weighted clusters, adding the ones I can think of here

  • Traffic Splitting + Session Affinity https://github.com/envoyproxy/envoy/issues/8167
  • Traffic Splitting + Session Persistence https://github.com/envoyproxy/envoy/issues/24741
  • Traffic Splitting + Consistent Hashing Loadbalancing (Maglev) https://github.com/envoyproxy/envoy/issues/21675
  • Retries dont consider other hosts in other clusters https://github.com/envoyproxy/envoy/issues/18837#issuecomment-956946870

@mathetake @cpakulski @nezdolik are they any other limitations of weighted clusters you are aware of ?

arkodg avatar Feb 27 '25 18:02 arkodg

+1

Gateway API spec recently seems to be introducing policies on a per-BackendReference (Service, ServiceImport, Backend) basis as well.

For example:

  • gateway.networking.k8s.io/v1alpha3: BackendTLSPolicy (reference)
  • gateway.networking.k8s.io/v1alpha2: BackendLBPolicy (reference)

Revisiting the current design would allow us to extend custom backend policies cleanly.

The current extension hook (Translation) for v3.Cluster is very primitive, and does not provide any context to mutate a v3.Cluster xDS.

If we have a Envoy v3.Cluster-per-BackendReference, we could introduce a Cluster Modification hook to the extension server API, and pass a list of Unstructured resources (corresponding to the Policy resources EG watched for the corresponding BackendReference) as context to be able to mutate the xDS.

muwaqar avatar Mar 17 '25 19:03 muwaqar

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Apr 16 '25 20:04 github-actions[bot]

the failover feature doesnt work for weighted clusters https://github.com/envoyproxy/gateway/issues/5813

arkodg avatar Apr 25 '25 23:04 arkodg

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar May 26 '25 00:05 github-actions[bot]

we discussed this issue in the community meeting today, and revisiting this (treating a backendRef kind/namespace/name common across all xRoutes) for users who do not care about the features highlighted here, and instead want a unique xDS cluster per backendRef to reduce scale (stats, cluster certs, active health checks)

  • This will most likely require an opt in field at an EnvoyGateway or EnvoyProxy level
  • We will need a way to apply xDS Cluster settings to a backendRef
    • reuse BackendTrafficPolicy (ignore/reject route specific fields)
    • create a new xPolicy that can only target a backendRef
    • enhance Backend
    • consider Route Delegation

thanks @muwaqar for volunteering to investigate further

arkodg avatar Nov 19 '25 01:11 arkodg

Just to add some observations:

  • We're investigating a performance impact that's related to the no. of active clusters.
  • when comparing to ambassador (another gateway), we can see no. of clusters are significantly less (ambassador creates a cluster per service so makes sense)
  • ambassador shows a lot less latency introduced at the gateway, while at 50th percentile, envoy gateway adds twice as much latency. We're still in the process of investigating this. But if the numbers are accurate, that's a significant latency introduced at the gateway.

suhdev avatar Nov 25 '25 02:11 suhdev

@suhdev increase in clusters is likely to increase memory and stats (which can be reduced using https://github.com/envoyproxy/gateway/pull/5898 ) but shouldn't impact latency, latency may be tied to an increase in filters enabled that may not have been enabled in your previous gateway, so suggest comparing filters on listeners and on routes across both

arkodg avatar Nov 25 '25 02:11 arkodg

I missed the community meeting this week. Is there anything I can help out with here? Is @muwaqar taking on the initial API design for this?

jukie avatar Nov 26 '25 21:11 jukie

I am looking into this, @jukie are you working on a specific timeline?

muwaqar avatar Nov 26 '25 21:11 muwaqar

No timeline I just saw that it's marked for 1.7. Please let me know or reach out on slack if there's anything you'd like to collaborate on!

jukie avatar Nov 26 '25 21:11 jukie

@arkodg

@suhdev increase in clusters is likely to increase memory and stats (which can be reduced using https://github.com/envoyproxy/gateway/pull/5898 ) but shouldn't impact latency, latency may be tied to an increase in filters enabled that may not have been enabled in your previous gateway, so suggest comparing filters on listeners and on routes across both

I think that having many clusters also de-optimizes connection pooling. If instead of 1 large pool of connections towards the upstream, we have many small pools, it's much more likely that connection establishment will occur more often, thus impacting latency:

  • Need to init connections in each pool
  • Smaller pools are likely to see less frequent activity and retire pooled connections, meaning they need to be re-established later on

I can't say that this is the driving factor for what @suhdev is seeing. I agree that it's more likely a change in filter chains, etc. that's making the largest impact here. But, I think that it's an additional consideration for the work @muwaqar is driving now.

guydc avatar Dec 12 '25 13:12 guydc