gateway icon indicating copy to clipboard operation
gateway copied to clipboard

feat: Support gateway with headless envoy proxy service

Open davidalger opened this issue 1 year ago • 9 comments

What type of PR is this?

feat: Support gateway with headless envoy proxy service

What this PR does / why we need it:

Currently it's possible to ensure stable external IPs via either Gateway.Spec.Addresses or loadBalancerIP on the service but not internal facing cluster IPs.

This adds support for setting the clusterIP field on the proxy service to a stable address and configuring the proxy as a headless service when clusterIP: None is set on the EnvoyProxy config.

Since the clusterIP field cannot be mutated, this also implements delete/create when the configured value for clusterIP changes to ensure complete reconciliation of the change rather than return an error.

davidalger avatar Feb 05 '24 21:02 davidalger

Codecov Report

Attention: Patch coverage is 76.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 64.46%. Comparing base (e3994ec) to head (66c1faf). Report is 593 commits behind head on main.

:exclamation: Current head 66c1faf differs from pull request most recent head ddf2419

Please upload reports for the commit ddf2419 to get more accurate results.

Files Patch % Lines
...ternal/infrastructure/kubernetes/infra_resource.go 0.00% 5 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2565      +/-   ##
==========================================
- Coverage   64.50%   64.46%   -0.04%     
==========================================
  Files         116      116              
  Lines       17790    17808      +18     
==========================================
+ Hits        11475    11480       +5     
- Misses       5573     5584      +11     
- Partials      742      744       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 05 '24 21:02 codecov[bot]

@davidalger can you share more about why making envoy svc a headless service is useful (setting ClusterIP to None) ? imo the control plane already deals with Endpoint IP translation if this is an edge case, it can be solved with https://github.com/envoyproxy/gateway/pull/2374

arkodg avatar Feb 05 '24 21:02 arkodg

@arkodg The primary use case for a headless service in context of this project is likely going to be for internal gateways where avoiding kube-proxy is desired and to implement client-side balancing of HTTP/2 requests to distribute them evenly across multiple envoy proxies acting as a routing layer in the request path.

With a normal ClusterIP service one will essentially have a basic L4 load balancer in front of the Envoy proxies since traffic sent to cluster IPs is routed via kube-proxy. So they are useful as the cluster IP isn't allocated, kube-proxy doesn't handle them and there is no load-balancing performed by k8s for headless services. On GKE specifically where VPC-native clusters have natively routable Pod IP addresses it's better to hit the IP of the pod directly rather than route through a cluster IP.

DNS for headless services is programmed a bit differently with A records being configured to return the IP address(es) of the Pods backing the service, allowing internal clients to route directly into the envoy proxies and also allows for internal non-Kubernetes consumers to use DNS discovery mechanisms to locate envoy proxy pod IPs.

In my case I also plan to use this simply to avoid allocating the cluster IP on external facing gateways configured with cloud.google.com/neg annotations where the resulting Standalone NEGs are attached (independently of the cluster) to global LBs handling TLS.

davidalger avatar Feb 05 '24 22:02 davidalger

if this is an edge case, it can be solved with https://github.com/envoyproxy/gateway/pull/2374

Looking at this again this morning and realizing the mentioned PR won't actually address the use-case of a headless service by itself as the Gateway will never reach a programed state without any addresses assigned to it as computeGatewayProgrammedCondition will simply return AddressNotAssigned until there is one or more address present on the service. In adding support for using a headless service the logic needs to be amended (as this PR does) to account for clusterIP == "None" in calculating the condition.

davidalger avatar Feb 06 '24 15:02 davidalger

if this is an edge case, it can be solved with #2374

Looking at this again this morning and realizing the mentioned PR won't actually address the use-case of a headless service by itself as the Gateway will never reach a programed state without any addresses assigned to it as computeGatewayProgrammedCondition will simply return AddressNotAssigned until there is one or more address present on the service. In adding support for using a headless service the logic needs to be amended (as this PR does) to account for clusterIP == "None" in calculating the condition.

won't this work, if svc type is LoadBalancer, but ClusterIP is none ? still think this use case fits well into the new mergePatch WIP enhancement

arkodg avatar Feb 06 '24 21:02 arkodg

won't this work, if svc type is LoadBalancer, but ClusterIP is none ? still think this use case fits well into the new mergePatch WIP enhancement

clusterIP: None is only valid with type: ClusterIP which makes sense since the intent is for a 'headless' service, which as the name implies does not have an LB of any kind (either kube-proxy or cloud-provider specific lb) in front of it.

The concept for mergePatch allowing users to configure fields without waiting for them to be added and exposed in a future release or to support uncommon use-cases such as hostNetwork is a good move that I tend to agree with, but it won't work for all use-cases such as this one where programming conditions are affected or reconciliation requires special handling to apply.

davidalger avatar Feb 07 '24 00:02 davidalger

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

github-actions[bot] avatar Mar 08 '24 04:03 github-actions[bot]

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

github-actions[bot] avatar Apr 07 '24 08:04 github-actions[bot]

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

github-actions[bot] avatar May 07 '24 12:05 github-actions[bot]

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

github-actions[bot] avatar Jun 06 '24 20:06 github-actions[bot]