enhancements
enhancements copied to clipboard
KEP-1645: define dual stack policies and fields
- One-line PR description: Define dual stack recommendations and fields
- Issue link: #1645
- Other comments: This does three things: define initial suggestion as to what an implementation may do to support dual stack services, fix the max items for the IPs field (which is already fixed in the actual CRD) and add a IPfamilies matching the same field in the Service that implementation may use to reconcile this globally with an implementation defined policy.
Triage note: Had some discussion with comments from @mikemorris last SIG-MC, can you please add them to this PR so we can talk about them?
Hi @mikemorris are you still looking at commenting here about your concerns about this change?
Thanks! Quickly answering to those point but we can have a longer discussion in the sig meeting if you are available there
Trying to capture a high-level concern I have with this direction:
1. Using a Service is just one possible (albeit common) implementation of MCS - alternative explorations such as [ClusterIP Gateways](https://github.com/kubernetes-sigs/gateway-api/pull/3608) are currently being developed and may be an option in the future . Service is a bloated resource and in general I would have a strong preference towards avoiding leaking what should be implementation details up into the actual spec resources.
IIUC all the implementation works with a Service in a way or in another, we cannot block PR to ServiceImport by saying that hypothetically another alternative to Service which is very much experimental and that (AFAIK) doesn't even have consensus among sig network/Gateway-API folks is the way forward. If we were to do that we can probably also forget about bumping MCS-API to v1beta1 too for instance...
2. I don't think adding this field to ServiceImport should actually be necessary. 1. On Service, it is optionally configured by a service owner in conjunction with [`.spec.ipFamilyPolicy`](https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services) to specify what IP family or familiies should be made available for a Service on a dual-stack cluster, and which family should be used for the legacy`.spec.clusterIP` field.
2. In contrast to Service, on a ServiceImport no discretion or decision-making should be required - the available IP families will be determined by the IP families of Services exposed by ServiceExports, and may be constrained by the IP family stack configuration of the importing cluster. The exported Services _may_ be on clusters with different dual-stack configurations (some IPv4 only, some IPv6, some dual-stack) and _may_ have different configurations for which IPs are available for each Service in each cluster. I believe determining the appropriate dual-stack configuration should be possible by watching the associated exported Service resources (and their EndpointSlices) directly, from either a centralized controller _or_ per-cluster decentralized API server watches.
In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.
3. I don't see this field as being helpful for order-of-operations concerns in creating an implementation Service resource, because at any point the available endpoints for a ServiceImport may _change_ (an IPv6-only cluster may go offline while an IPv4-only cluster remains available and the ServiceImport in a dual-stack cluster should likely be updated to drop its IPv6 address from the `ips` field if the topology of an implementation requires direct/flat networking and no IPv6 endpoints are available (cleaning up and removing the ServiceImport entirely if no backends are routable from the importing cluster is a viable alternative too). Similarly, if an exported Service on a new IP family becomes available when it wasn't originally, the ServiceImport should likely be updated to publish an address in the `ips` field for the newly-available family when adding the backends.
Yep the ServiceImport ipFamilies may change and we do plan to reflect it on the IPs. I am not sure how you see this as unhelpful as you described pretty much what we are going to do... Also note that in our case we want some global consistency of what an IPFamily will get there so we will essentially do the intersection of all exported service ipFamily + what is supported by the local cluster.
4. I think what may be helpful instead is clarifying expected behavior in the various scenarios @MrFreezeex had laid out in the presentation, and possibly encoding those in conformance tests and/or a `status` field indicating that a ServiceImport is "ready" (has backends available which are reachable from the importing cluster (which may have different constraints or meaning in centralized vs decentralized implementations) rather than expecting the ServiceImport and any supporting infra to be created (and destroyed) syncronously and be routable immediately at creation.
I was more planning to to do this in a second step/PR as while the initial use case might be tied to that PR some possible conditions on the ServiceImport might be relevant for other things (like reporting any errors related to the import :man_shrugging:). I am not sure that besides checking that the ServiceImport would be ready we can do much more on the conformance tests though.
I am not entirely sure what you mean about the behavior of "ready" and " to be created (and destroyed) syncronously and be routable immediately at creation" to me it would be more a place to put any error state whether it's something very generic with ready or some more specific implementation defined errors if there's a need for that.
In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.
Okay this is the implementation detail/constraint I had not been familiar with, will need to think about this more.
Triage notes:
- group still wants to follow up on conversation in https://github.com/kubernetes/enhancements/pull/5264#issuecomment-3001186304
In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.
Okay this is the implementation detail/constraint I had not been familiar with, will need to think about this more.
Submariner could also make use of the ipFamilies field on the ServiceImport. As you're probably aware, Submariner does not have a centralized controller and the constituent clusters do not have direct access to one another. So re: this statement above, "determining the appropriate dual-stack configuration should be possible by watching the associated exported Service resources", we cannot do that. Each cluster communicates its local service information to other clusters via a ServiceImport published via a shared hub cluster. Each cluster then has all the constituent service information in order to check for conflicts and apply a condition on its local ServiceExport and to create a derived/aggregated ServiceImport (note that we don't create a derived Service).
/lgtm
There's a handful of details/nuance to consider, but I think I'm largely convinced it makes sense to add ipFamilies.
- "The .spec.ipFamilies field is conditionally mutable: you can add or remove a secondary IP address family, but you cannot change the primary IP address family of an existing Service."
- https://kubernetes.io/docs/concepts/services-networking/dual-stack/
- We should likely extend this constraint to ServiceImport.
- I would expect that which family is set as primary on a ServiceImport should be inferred from the intersection of cluster configuration and available endpoints if applicable.
- For setting
.spec.ipFamilyPolicyon a derived Service I think implementations should prefer taking a looser approach as this may differ across exporting clusters (as described in @MrFreezeex's presentation/diagram), and thus not make sense to directly propagate and consider potential conflicts?- As an example, exporting a Service with a
RequireDualStackipFamilyPolicy from a dual-stack cluster might sense to be more flexible when creating a corresponding ServiceImport on a single-stack cluster (unless we really think this configuration should prevent creating a ServiceImport on single-stack clusters).
- As an example, exporting a Service with a
Triage notes:
- Want to limit the "shoulds" when it comes to things like whether the inference should be from an intersection, from a union, ranked a certain way, etc...
- Want to link to the Service dual stack docs because our overarching goal is to make MCS feel like an extension of Service
- Edge case that is hard is when a clusterset has heterogenous ipFamilies and this is what we have to articulate ourselves. And we need to be explicit that the implementor needs to figure out what they are doing for that per their implementation.
Thanks for the reviews @mikemorris!
There's a handful of details/nuance to consider, but I think I'm largely convinced it makes sense to add
ipFamilies.
"The .spec.ipFamilies field is conditionally mutable: you can add or remove a secondary IP address family, but you cannot change the primary IP address family of an existing Service."
- https://kubernetes.io/docs/concepts/services-networking/dual-stack/
I tried to reference that more in my latest version let me know if this looks better now 👀
- We should likely extend this constraint to ServiceImport.
As I was saying a bit in the call we should probably keep this implementation defined, for instance as the result of merging IPFamilies you might want to change the order of what's in IPFamilies field and I am not sure we should prevent that. For a derived Service style implementation you could just swap the orders IPs on the ServiceImport while your derived Service stays as is or even (not specific to derived service implementation) just express that the result of the merging across your clusters should be in this order but you don't want to change IPs order so you keep it as is.
- I would expect that which family is set as primary on a ServiceImport should be inferred from the intersection of cluster configuration and available endpoints if applicable.
For us in Cilium we will most certainly do that although some implementation might like an union too or whatever the local cluster support because there is some magic gateway that does intelligent thing to not care about IP protocol somehow. A while ago we were saying in a sig-mc meeting that we shouldn't dictate that in the KEP (and I recognize that you are not saying here that we should, I am just saying that out loud).
- For setting
.spec.ipFamilyPolicyon a derived Service I think implementations should prefer taking a looser approach as this may differ across exporting clusters (as described in @MrFreezeex's presentation/diagram), and thus not make sense to directly propagate and consider potential conflicts?
Yes I haven't implemented that yet but I think what we would most likely do in CIlium is:
- Start with the order of the IPFamily of the ServiceImport at creation (which would be intersection of all IPFamilies of the exported Service and the order dictated by the oldest if dual stack)
- If there's any change to the order of the ServiceImport do not attempt to recreate the derived Service but just change the order of the IPs in the ServiceImport in the logic that propagate IPs from derived Service to ServiceImport
- If there's an IP protocol to add or to remove do that (also we have the knowledge what the cluster support since we are the CNI but if you are not the CNI you could potentially ask users to pass that as a config to your project I guess)
- If the IP protocol to remove is the primary IP of the derived Service I am not sure yet, maybe we will recreate the derived Service since we do that for changing the headless-ness but this might change / we might improve the logic for headless-ness too
EDIT: For this my latest thinking about this is that we would most likely add an annotation on the derived Service on our side to mark the "real IP protocol" that we consider which means we could remove some protocol in this annotation while not necessarily removing it on the true IPFamilies field. This is again implementation defined territory and just to mention what Cilium might do with all of this!
- As an example, exporting a Service with a
RequireDualStackipFamilyPolicy from a dual-stack cluster might sense to be more flexible when creating a corresponding ServiceImport on a single-stack cluster (unless we really think this configuration should prevent creating a ServiceImport on single-stack clusters).
Yep this would be supported in our case, I am not sure of this yet but maybe what we would end up doing is:
- Keep the IPFamiliy not supported in the ServiceImport IPFamilies
- If there are none supported by the local cluster report that in the ready condition
- Filter out IPFamilies not supported when creating the derived Service/reporting the IPs to the ServiceImport (and possibly add some kind of condition saying that it happened but I am not sure if we should do this or not)
/lgtm
Triage notes:
- Mike will do a triple check since several changes are in reaction to his comments in the first place
- But overall this and its partner PR mcs-api repo are in very good shape in terms of being ready for approval according to us in the scrub cc @skitt @JeremyOT
/lgtm
Appreciate the thoughtfulness in considering all those scenarios/edge cases @MrFreezeex, I think with that context this is in good shape to merge.
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: MrFreezeex, skitt
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~keps/sig-multicluster/OWNERS~~ [skitt]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment