gateway-api icon indicating copy to clipboard operation
gateway-api copied to clipboard

GEP: Standard Mechanism to Merge Multiple Gateways

Open dprotaso opened this issue 2 years ago • 24 comments

What would you like to be added:

Support for merging Gateways should have standard & documented mechanic.

Why this is needed:

Prior Discussions: https://github.com/kubernetes-sigs/gateway-api/discussions/1248, https://github.com/kubernetes-sigs/gateway-api/discussions/1246

Knative generates on demand per-service certificates using HTTP-01 challenges. There can be O(1000) Knative Services in the cluster which means we have O(1000) distinct certificates. The Gateway Resource is a contention point since is the only place to attach listeners to gateways with certificates.

The spec currently has language to indicate implemenations MAY merge Gateways resources but the mechanic isn't defined. https://github.com/kubernetes-sigs/gateway-api/blob/541e9fc2b3c2f62915cb58dc0ee5e43e4096b3e2/apis/v1beta1/gateway_types.go#L76-L78

Secondly, given similar use-cases/needs for control planes to manage listeners it might make sense that merging Gateways is part of extended conformance.

dprotaso avatar Feb 08 '23 16:02 dprotaso

+1 to this proposal.

  • This has been asked by users of Envoy Gateway to make their runtime deployments more efficient (currently a Gateway resource maps to a unique Data Plane (Envoy Proxy) deployment)

  • The spec says compatible listeners can be collapsed, but since Gateways can be applied dynamically
    it is bound to cause traffic disruption in the data plane if the implementation collapses or separates listener implementations in runtime.

  • If there is way to add a knob which sets constraints for the L3/L4 fields of the Listener such as ensuring a port must be unique across all Listeners within all Gateways within a GatewayClass, statically collapsing listeners in implementations should be doable. Below is the current spec https://github.com/kubernetes-sigs/gateway-api/blob/0177808c57566502f8fb697abd6d42c06008ff6a/apis/v1beta1/gateway_types.go#L72

arkodg avatar Feb 08 '23 18:02 arkodg

FWIW how we do this in Istio is https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#manual-deployment. Basically the same way to apply 1 Gateway to a manually deployed Deployment/Service instead of automatically deployed one can be used to apply N Gateways and merge them.

If they want automated deployment and merging, they would make one "root" Gateway and attach all others. Actually that is another use case for https://github.com/kubernetes-sigs/gateway-api/pull/1596

howardjohn avatar Feb 08 '23 19:02 howardjohn

@howardjohn conformance tests dont pass when the implementation merges the listeners, https://github.com/envoyproxy/gateway/issues/349 has more context

arkodg avatar Feb 08 '23 19:02 arkodg

I don't think something implicitly deciding to collapse or not is a great idea which is the root cause of that. In the Istio way users have to explicitly opt into merging, which the conformance tests don't do, so we don't have an issue.

howardjohn avatar Feb 08 '23 19:02 howardjohn

I don't think something implicitly deciding to collapse or not is a great idea which is the root cause of that. In the Istio way users have to explicitly opt into merging, which the conformance tests don't do, so we don't have an issue.

yeah, Im hoping the intent to merge is defined by Gateway API, else implementations will add their own

arkodg avatar Feb 08 '23 19:02 arkodg

@shaneutt curious what does the label priority/awaiting-more-evidence mean and what's the lifecycle of issues in this sub project?

dprotaso avatar Feb 21 '23 22:02 dprotaso

@dprotaso we're in the midst of trying to better organize things based on priority. Given that 3 people are interested in this one, priority/awaiting-more-evidence was not really the right fit so I've changed it to priority/important-longterm. Please note that while these labels can be helpful for us to organize work and releases, they aren't necessarily indicative of the actual order in which things will be done. It's reasonable for someone with keen interest to champion an issue like this and take it straight into in-progress if they feel strongly about it.

shaneutt avatar Feb 21 '23 22:02 shaneutt

@arkodg

The spec says compatible listeners can be collapsed, but since Gateways can be applied dynamically it is bound to cause traffic disruption in the data plane if the implementation collapses or separates listener implementations in runtime.

Knative uses multiple Kubernetes Services to resolve port conflicts.

Our custom Istio installation has a single k8s deployment serving public traffic and internal traffic. This is because we create one K8s Service with type=LoadBalancer and the other being type=ClusterIP pointing to the same deployment but targetting different ports.

Thus if the Gateway resource had a scope property to define whether gateways were public (to the internet) and local to the cluster then I could forsee merging of

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: prod-internal
spec:
  gatewayClassName: acme-lb
  group: knative
  scope: Internal
  listeners:  
  - protocol: HTTP
    port: 80
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: prod-web
spec:
  gatewayClassName: acme-lb
  group: knative
  scope: Public
  listeners:  
  - protocol: HTTP
    port: 80

to be done by creating the following K8s Services

apiVersion: v1
kind: Service
metadata:
   name: prod-internal
spec:
  type: ClusterIP
  ports:
    - name: http2
      port: 80
      targetPort: 8081 # Dynamically allocated port
---
apiVersion: v1
kind: Service
metadata:
   name: prod-web
spec:
  type: LoadBalancer
  ports:
    - name: http2
      port: 80
      targetPort: 80 # Dynamically allocated port

Each gateway could return a unique IP in the status.addresses block.

dprotaso avatar Feb 21 '23 22:02 dprotaso

@dprotaso above case helps merge K8s deployments (improving efficiency), there is also a use case where some users might want to further merge gateways within a single K8s service, reducing the number of cloud LBs needed (ratio of cloud LB to gateways is 1: N)

arkodg avatar Feb 21 '23 23:02 arkodg

there is also a use case where some users might want to further merge gateways within a single K8s service

@arkodg Yeah we would benefit from that as well. Curious if you have any details/issues to link about user expectations around this feature?

dprotaso avatar Feb 23 '23 16:02 dprotaso

@dprotaso end users prefer lesser Cloud LBs / Services to reduce the management / operational complexity for managing the IPs, security, observailibity of these components

I think your proposed solution of introducing scope does solve the problem of merging Services if multiple Gateways have the same value of scope set.

Now if scope is of type string it is allowing the platform admins to create hints for the underlying implementation to group & isolate Gateway resources into domains in the infrastructure. Isnt that isolation handled at the GatewayClass level today :) ?

arkodg avatar Feb 23 '23 21:02 arkodg

I think my example of scope in https://github.com/kubernetes-sigs/gateway-api/issues/1713#issuecomment-1439184112 shows that the same infrastructure could serve different scoped Gateways.

Now if scope is of type string it is allowing the platform admins to create hints for the underlying implementation to group & isolate Gateway resources into domains in the infrastructure. Isnt that isolation handled at the GatewayClass level today :) ?

I think a type string makes sense cause then you can have standard scopes and vendored-prefixed ones.

it is allowing the platform admins to create hints for the underlying implementation to group & isolate Gateway resources into domains in the infrastructure. Isnt that isolation handled at the GatewayClass level today :) ?

I think there are two aspects to this issue

  1. When creating a Gateway how do you hint it should be merged with another (this helps scale Gateway/listener management)
  2. How do you map Gateways onto infrastructure.

I can see 1. being a distinct property on a Gateway - In my example I was using the property group so Gateways would declare themselves part of a set.

For 2. I think this would need to be part of the GatewayClass as it's implementation specific.

dprotaso avatar Feb 23 '23 22:02 dprotaso

yeah thinking more on this, I'm a +1 on either group or scope to allow users to bucket Gateways into domains. This field can be made more powerful than a Gateway annotation if the implementation / validation webhook in Gateway API ensures all Gateway listeners within a specific group are compatible

arkodg avatar Feb 23 '23 22:02 arkodg

I'm very -1 on some sort of group or scope field, that's explicitly what GatewayClass is supposed to be doing. It's intended that an implementation be able to reconcile multiple GatewayClasses with slightly different settings, that's why we used the controllerName field rather than explicitly specifying a single GatewayClass. Any time you think "we need a way to bucket these Gateways", we should be thinking of a GatewayClass thing first.

I think that @howardjohn's ideas in GEP-1762 (#1757) about using Address seem like a pretty viable way to handle both the manually-managed infrastructure use case, and the "how to specify a set of merged Gateways" use case. I had originally anticipated that "merge these Gateways" would be specified as a GatewayClass setting, but it seems like there may be too much shenanigans possible if we do it that way. I'm going to think more about it though.

youngnick avatar Feb 28 '23 00:02 youngnick

Related - @youngnick take a look at the discussion in the GEP - https://github.com/kubernetes-sigs/gateway-api/pull/1653

I have yet to update GEP contents will do that later tonight based on the discussion

dprotaso avatar Feb 28 '23 00:02 dprotaso

@dprotaso is this a separate GEP from #1653? Why so much discussion here before someone is assigned?

candita avatar Mar 06 '23 23:03 candita

It is a separate GEP, yes. But I think the eventual solution will be a combination of pieces from #1651 (PR #1653), and #1762 (PR #1757), rather than its own GEP.

youngnick avatar Mar 08 '23 04:03 youngnick

@dprotaso is this a separate GEP from https://github.com/kubernetes-sigs/gateway-api/pull/1653?

Yup - this GEP was to help solve the issue of managing/sharding a massive list of spec.listeners - see the prior discussions here - https://github.com/kubernetes-sigs/gateway-api/discussions/1248, https://github.com/kubernetes-sigs/gateway-api/discussions/1246

I think there are two ways to look at this problem - and they might warrant two GEPs

  1. As an end-user how do I logically merge different Gateways
  2. As an end-user how do I map Gateways to infrastructure

This GEP is taclking the first and I think #1762 is tackling the second (I have yet to go through it)

dprotaso avatar Mar 08 '23 19:03 dprotaso

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 20 '24 20:01 k8s-triage-robot

/remove-lifecycle stale

/lifecycle frozen

youngnick avatar Jan 22 '24 02:01 youngnick

FYI - the GEP PR has been updated to support cross-namespace. From my comment https://github.com/kubernetes-sigs/gateway-api/pull/1863#issuecomment-1927794180

I now have a use case that would warrant having a parent in a different namespace. Imagine the cluster operator persona is split into two parts - the cluster operator and the application operator.

The cluster operator provisions a 'parent' Gateway in some system namespace. They're responsible for installing/upgrading the underlying package that implements the Gateway API. They want to ensure costs are kept low so they want their Gateway instance to be shared with app operators. Thus there's only a single LB for the cluster.

The application operator owns a namespace and sets up the namespace to be used by app developers. The app operator will want to define a child Gateway with the right listeners and attaches the 'child' Gateway to 'parent' Gateway mentioned above.

The application developer creates HTTPRoute and attaches them to the child gateway in their namespace.

Some implementations have expressed interest:

  • Istio has expressed interest in this GEP (@howardjohn)
  • Envoy Gateway has implemented a merge gateway feature (https://gateway.envoyproxy.io/v0.6.0/api/extension_types/#envoyproxyspec) (@arkodg)

dprotaso avatar Feb 06 '24 16:02 dprotaso

thanks for restarting work on this GEP @dprotaso. This has been a popular ask for users in Envoy Gateway, and we finally ended up adding a knob (in our previous v0.6.0 release ) at the GatewayClass level (via the parametersRef custom resource called EnvoyProxy) called MergeGateways . This is primarily because there is a 1:1 mapping between GatewayClass and data plane infrastructure in Envoy Gateway today as all infrastructure specific fields like resources and replicas can only be configured via the custom resource thats attached to the GatewayClass

arkodg avatar Feb 06 '24 17:02 arkodg

We do need this, especially as several implementations are already doing "something" in this regard.

/triage accepted

However we are targeting early April for v1.1.0, and in review we don't expect that this will be ready or have had enough soak time for that timeline, so we're looking at v1.2.0 for this currently.

shaneutt avatar Mar 12 '24 23:03 shaneutt

/assign @dprotaso

dprotaso avatar May 13 '24 22:05 dprotaso

Leaving this open until the GEP graduates to GA

robscott avatar Jan 31 '25 20:01 robscott

Hello ✋

I'm interested in GEP-1713 (ListenerSets) and wondering about its path to Implementable status.

From what I can see, it seems like the main remaining challenges might be:

  1. Feature naming - No formal feature name is currently defined
  2. Conformance testing - Tests aren't available yet for this feature
  3. Implementation validation - Possibly needs broader implementation support

Is this an accurate assessment of what's blocking the progression to Implementable? Are there other specific requirements or challenges I'm missing?

Thank you 🙏

kahirokunn avatar Jul 02 '25 00:07 kahirokunn

Hmm, it's actually Experimental at the moment, which is a later step than Implementable.

The steps you describe are totally correct for moving this to Graduated (aka GA) though.

youngnick avatar Jul 02 '25 01:07 youngnick

I know Istio has implemented the API. I'll be looking at writing conformance in August (on vacation starting mid-July)

dprotaso avatar Jul 02 '25 01:07 dprotaso

@kahirokunn Istio's implementation was done in https://github.com/istio/istio/pull/55595 if you wanted to take a look!

mikemorris avatar Jul 02 '25 15:07 mikemorris

Thanks🥹

kahirokunn avatar Jul 02 '25 15:07 kahirokunn