gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Support routing to Service Cluster IP

Open arkodg opened this issue 2 years ago • 24 comments

arkodg avatar Sep 19 '23 20:09 arkodg

two ways to solve this

  • add a field within EnvoyProxy called disableEndpointRouting / enableClusterIPRouting which allows us to fallback to routing using Service Cluster IP
  • incorporate this into the load balancing API

arkodg avatar Sep 19 '23 20:09 arkodg

ptal @envoyproxy/gateway-maintainers

arkodg avatar Sep 19 '23 20:09 arkodg

chatted with @kflynn the other day about this and we both feel adding it in the load balancing API is more suitable

  • if we added it another API, userX may enable serviceIP at the EnvoyProxy level, user Y may set the LB Policy and then wonder why the LB policy isn't working. Adding it inside the LB Policy forces the user to be aware of the lb & routing decision to endpoints

arkodg avatar Sep 22 '23 17:09 arkodg

I am +1 on load balancing api, if we set this in EnvoyProxy, IMHO, this is on GC level and working for all managed Gateways, but if people just want this to work in specific Gateways, a Policy is more suitable.

Xunzhuo avatar Sep 22 '23 17:09 Xunzhuo

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar Oct 22 '23 20:10 github-actions[bot]

this new ClusterIP load balancing type should live in https://github.com/envoyproxy/gateway/blob/main/api/v1alpha1/loadbalancer_types.go

arkodg avatar Feb 16 '24 02:02 arkodg

Hi @arkodg, i think this use case can be achieved by adding a fallbackToClusterIP field

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  namespace: default
  name: policy-for-route1
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: httproute-1
    namespace: default
  loadBalancer:
    type: LeastRequest
    slowStart:
      window: 300s
    fallbackToClusterIP: true # Enables fallback to service's cluster IP

deszhou avatar Mar 05 '24 14:03 deszhou

@yeedove, I personally prefer

  loadBalancer:
    type: ClusterIP

arkodg avatar Mar 05 '24 18:03 arkodg

Turns out that to get Envoy Gateway working well with Linkerd, we either need this or we need an l5d-dst-override header to be added to the request, so I would definitely like this. šŸ™‚

I'd personally vote for routingType: service or routingType: endpoint, but I'm good with @arkodg's type suggestion.

kflynn avatar Mar 07 '24 18:03 kflynn

please assign to me

deszhou avatar Mar 12 '24 14:03 deszhou

I like @kflynn's suggestion of a dedicated field routingType, if we take this approach, we'll have to use CEL to disable loadBalancing if routingType: Service

arkodg avatar Mar 16 '24 15:03 arkodg

  • we discussed this in the community meeting today, most were in favor of this field living in the BackendTrafficPolicy thanks for volunteering to pick this up @kflynn, temporarily assigning this to you for now here's a link to help you get started https://github.com/envoyproxy/gateway/blob/2a38de6067d421026994b8cbea202e399e68c98e/internal/gatewayapi/route.go#L1074 (you'll need to override the global EndpointRoutingDisabled field)

arkodg avatar Mar 28 '24 19:03 arkodg

rethinking this one, if the field lives in BackendTrafficPolicy, then it won't apply to other backendRefs in other places, e.g. ext_auth in SecurityPolicy, or ext_proc in EnvoyExtensionPolicy or openTelemetry sinks in EnvoyProxy . we may want to move this field into EnvoyProxy cc @zhaohuabing @guydc

arkodg avatar Apr 03 '24 15:04 arkodg

I guess this depends on do we need per-service configuration for the choice of cluster IP vs pod IP?

zhaohuabing avatar Apr 03 '24 15:04 zhaohuabing

one reason to select Service routing is for mesh case (sidecar beside gateway) to work seamlessly, so it will work for the xRoute backends (with sidecars ) but not if the ext auth / ext proc / Otel (have sidecars ) because it won't get intercepted properly in the gateway sidecar

arkodg avatar Apr 03 '24 15:04 arkodg

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

github-actions[bot] avatar May 03 '24 16:05 github-actions[bot]

Any progress on this? I’m looking to deploy EG alongside an Istio mesh and from what I’m hearing here and in the related ticket, I would need this feature to make it work?

jasonmoore2k avatar May 13 '24 13:05 jasonmoore2k

@kflynn still planning on working on this one ?

arkodg avatar May 13 '24 20:05 arkodg

@kflynn by setting the routingType: service would we lose the observability into each endpoint in the /clusters config & metrics and only display the ClusterIP?

@arkodg discussed the possibility of running an idle linkerd proxy sidecar on the gateway which would handle mTLS, but then it seems like we'd lose Linkerd multi-cluster service discovery.

ryanhristovski avatar May 16 '24 17:05 ryanhristovski

@kflynn by setting the routingType: service would we lose the observability into each endpoint in the /clusters config & metrics and only display the ClusterIP?

I think if service-mesh is being used, the metrics are likely going to be more useful from the mesh than from the ingress, because the mesh will handle service -> service as well as envoy -> service in the same format and namespace..

lnattrass avatar May 23 '24 21:05 lnattrass

I think @lnattrass has the right of it – part of the point of the mesh is that it should be handling both metrics and routing for you in this case.

kflynn avatar May 23 '24 21:05 kflynn

@arkodg I'd still like to, yes! so here's a question for you. There already is an EndpointRoutingDisabled field in the Translator; do we think that works as a global switch? or is a leftover untested vestigial thing? šŸ™‚

kflynn avatar May 23 '24 21:05 kflynn

@arkodg I'd still like to, yes! so here's a question for you. There already is an EndpointRoutingDisabled field in the Translator; do we think that works as a global switch? or is a leftover untested vestigial thing? šŸ™‚

@kflynn this should work as expected, the gatewayapi test should prove this easily, if the flag is set, the clusterIP will get set it in the IR

arkodg avatar May 23 '24 22:05 arkodg

@lnattrass we're not necessarily interested in Linkerd for routing on our ingress, but more so for cross-cluster service discovery and mTLS. So we'd prefer to keep the load balancing capabilities and metrics as close to our proxy as possible. In the past the CPU requirements of running 4k RPS+ on an ingress pod with linkerd over doubles our usage and since we've always used headless services we haven't relied on Linkerd's load balancing.

I suppose this is more of a niche use-case, but it does limit us on implementing multi-cluster Linkerd.

@kflynn would Linkerd be open to mirroring EndpointSlices as well or is it out of scope? I could open an issue (and potentially work on it) if you think it's worth considering

ryanhristovski avatar May 24 '24 14:05 ryanhristovski

Hi @kflynn I'm sorry to take this over from you, but we internally needed a fix earlier :) feel free to check out and comment my PR, though!

evacchi avatar Jun 05 '24 10:06 evacchi

@evacchi No apology needed! as you can see, I've been pulled onto other things, thank you for making this happen!! šŸ™‚

kflynn avatar Jun 27 '24 14:06 kflynn