sveltos
sveltos copied to clipboard
Feature Request: Provide an MCS Controller Implementation Using Sveltos’s Event Framework
Proposal
We propose a feature (or guideline documentation) illustrating how to implement an MCS (Multi-Cluster Services) controller using Sveltos’s Event Framework.
The solution adheres to KEP-1645, following the principle of namespace sameness. It manages ClusterIP, Headless, and LoadBalancer/NodePort services consistently, while explicitly disallowing ExternalName services.
Background
The Kubernetes MCS API standardizes how services can be exported from one cluster and discovered in others. By pairing this with Sveltos’s Event Framework, we can fully automate the service export → derived service + endpointslice + serviceimport pipeline. This removes the burden of manually provisioning these resources across multiple clusters.
Benefits
- Fully Automated Creation: Eliminates manual steps for provisioning MCS resources across clusters.
- Unified Approach for Multiple Service Types: ClusterIP, NodePort, and LoadBalancer services are all converted consistently (with optional special handling for headless).
- Clean Resource Ownership: OwnerReferences let Kubernetes handle garbage collection automatically, simplifying lifecycle management.
- DNS-Friendly for Headless Services: CoreDNS can resolve these exported headless services, thanks to the EndpointSlice objects.
- Extensible Design: Sveltos’s Event Framework can be extended to track additional resource states or integrate with other multi-cluster components if needed.
By following the strategies outlined above and implementing them in Sveltos’s Event Framework, platform engineers can reliably export services from any source cluster and consume them with minimal configuration overhead. This significantly accelerates multi-cluster use cases—whether for high availability, traffic optimization, or cross-environment integrations—without inventing a proprietary approach.
MCS Controller Implementation Guide Using Sveltos
Overview
This guide details how to automate the implementation of the Kubernetes Multi-Cluster Services (MCS) API using Sveltos's Event Framework.
Based on KEP-1645 and the principle of namespace sameness, we present accurate conversion patterns for each service type.
In this revised document, we introduce the concept of creating derived Services named derived-$hashServiceExport, where $hash is computed from the ServiceExport name. Refer to the following implementation for hashing:
https://github.com/kubernetes-sigs/mcs-api/blob/b4f72b8c11b640b049a2c247994a2de3eb0dda75/pkg/controllers/common.go#L39-L43
We also create one EndpointSlice for each source cluster associated with the ServiceExport, named derived-$hash-$clusterId. For clarity, the label multicluster.kubernetes.io/service-name: <clusterset service name> is added to both the EndpointSlice and the derived Service. Additionally, we establish OwnerReferences from the ServiceImport to the Service, and from the Service to the EndpointSlice.
Processing Patterns by Service Type
For reference: https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/1645-multi-cluster-services-api/README.md#clusterset-service-behavior-expectations
1. ClusterIP / LoadBalancer / NodePort Services
These service types can be handled with the same processing pattern. Below is an example:
Original Service in Source Cluster (ClusterID: cluster-a)
apiVersion: v1
kind: Service
metadata:
name: web-service
namespace: default
spec:
type: ClusterIP # or LoadBalancer or NodePort
selector:
app: web
ports:
- name: http
port: 80
targetPort: 8080
ServiceExport in Source Cluster (ClusterID: cluster-a)
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: web-service
namespace: default
ClusterIP Service generated in cluster-a and cluster-b
apiVersion: v1
kind: Service
metadata:
name: derived-$hashServiceExport
namespace: default
labels:
multicluster.kubernetes.io/service-name: web-service
multicluster.kubernetes.io/service-imported: "true"
app.kubernetes.io/managed-by: sveltos
ownerReferences:
- apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
name: web-service
# other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
spec:
type: ClusterIP
selector: # Selector is maintained based on namespace sameness
app: web
ports:
- name: http
port: 80
targetPort: 8080
ServiceImport generated in cluster-a and cluster-b
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
metadata:
name: web-service
namespace: default
annotations:
multicluster.kubernetes.io/derived-service: derived-$hashServiceExport
spec:
type: ClusterSetIP
ports:
- name: http
port: 80
protocol: TCP
ips:
- "10.96.0.1" # Cluster IP assigned to the derived Service. eg. kubectl get svc derived-$hashServiceExport -o yaml | yq '.spec.clusterIp'
status:
clusters:
- cluster: cluster-a
EndpointSlice generated in cluster-a and cluster-b
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: derived-$hashServiceExport-cluster-a
namespace: default
labels:
kubernetes.io/service-name: derived-$hashServiceExport
multicluster.kubernetes.io/service-name: web-service
cluster.x-k8s.io/cluster-name: cluster-a
endpointslice.kubernetes.io/managed-by: sveltos
ownerReferences:
- apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
name: web-service
# other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
- name: http
protocol: TCP
port: 80
endpoints:
- addresses:
- "10.0.1.1"
conditions:
ready: true
nodeName: node-a
- addresses:
- "10.0.1.2"
conditions:
ready: true
nodeName: node-b
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: derived-$hashServiceExport-cluster-b
namespace: default
labels:
kubernetes.io/service-name: derived-$hashServiceExport
multicluster.kubernetes.io/service-name: web-service
cluster.x-k8s.io/cluster-name: cluster-b
endpointslice.kubernetes.io/managed-by: sveltos
ownerReferences:
- apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
name: web-service
# other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
- name: http
protocol: TCP
port: 80
endpoints:
- addresses:
- "10.0.1.3"
conditions:
ready: true
nodeName: node-a
- addresses:
- "10.0.1.4"
conditions:
ready: true
nodeName: node-b
2. Headless Services (clusterIP: None)
https://github.com/coredns/multicluster/blob/49f47d950355f793d656aec8a6d198daf1d888b1/multicluster.go#L347-L381
Headless services require special handling, typically relying on DNS-based service discovery instead of a VIP (Virtual IP). Initially, one might assume an MCS controller only needs to create ServiceImport objects; however, since CoreDNS uses EndpointSlices as a record source (reference: https://github.com/coredns/multicluster/blob/49f47d950355f793d656aec8a6d198daf1d888b1/multicluster.go#L347-L381), the controller must also create EndpointSlices for each source cluster for correct DNS-based service discovery.
Original Service in Source Cluster (ClusterId: cluster-a)
apiVersion: v1
kind: Service
metadata:
name: stateful-service
namespace: default
spec:
clusterIP: None
selector:
app: stateful
ports:
- name: http
port: 80
targetPort: 8080
ServiceExport in Source Cluster (ClusterId: cluster-a)
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: stateful-service
namespace: default
ServiceImport generated in cluster-a and cluster-b
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
metadata:
name: stateful-service
namespace: default
annotations:
multicluster.kubernetes.io/derived-service: derived-$hashServiceExport
spec:
type: Headless
ports:
- name: http
port: 80
protocol: TCP
status:
clusters:
- cluster: cluster-a
EndpointSlice generated in cluster-a and cluster-b
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: derived-$hashServiceExport-cluster-a
namespace: default
labels:
kubernetes.io/service-name: stateful-service
multicluster.kubernetes.io/service-name: stateful-service
cluster.x-k8s.io/cluster-name: cluster-a
endpointslice.kubernetes.io/managed-by: sveltos
ownerReferences:
- apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
name: web-service
# other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
- name: http
protocol: TCP
port: 80
endpoints:
- addresses:
- "10.0.2.1"
conditions:
ready: true
nodeName: node-c
- addresses:
- "10.0.2.2"
conditions:
ready: true
nodeName: node-d
3. ExternalName Services
ExternalName services cannot be exported.
Original Service in Source Cluster (ClusterId: cluster-a)
apiVersion: v1
kind: Service
metadata:
name: external-db
namespace: default
spec:
type: ExternalName
externalName: db.example.com
ServiceExport (ClusterId: cluster-a)
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: external-db
namespace: default
This ServiceExport will fail with the following status:
apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: external-db
namespace: default
status:
conditions:
- type: InvalidService
status: "True"
reason: UnsupportedServiceType
message: "ExternalName services cannot be exported"
Key Implementation Points
-
Selector Maintenance
- Following namespace sameness principles, all imported Services maintain their selectors.
- This enables Pods with matching labels in the importing cluster to be automatically added as service endpoints when applicable.
-
(Optional) Derived Service Names
- For each ServiceExport, create a derived Service named
derived-$hashServiceExportwhere $hash is computed from the ServiceExport name. - One EndpointSlice per source cluster is created for every exported service. The EndpointSlice is named
derived-$hashServiceExport-$clusterId.
- For each ServiceExport, create a derived Service named
-
CoreDNS Integration for Headless Services
- Although the IP address resolution for headless services is handled by CoreDNS, actual EndpointSlices are still required because they serve as data sources for DNS records.
- Since there is no difference in EndpointSlices themselves between Headless and ClusterIP services, there is no need for conditional branching - you can process them exactly the same way as EndpointSlices for other services.
-
Labeling for Multi-Cluster
- EndpointSlices should include the label
multicluster.kubernetes.io/service-name: <clusterset service name>,which is required by the KEP. - For better UX and resource management, we also recommend adding the same label to the derived Service.
- EndpointSlices should include the label
-
(Optional) OwnerReferences for Automatic Garbage Collection
- The ServiceImport object is set as the owner of the derived Service.
- The derived Service is set as the owner of the EndpointSlice.
- This hierarchy allows you to track resource relationships efficiently and simplifies finalizer handling.
-
Service Type Conversion
- LoadBalancer and NodePort services are converted to ClusterIP in the derived Services.
- For headless services (clusterIP: None), create corresponding headless services in target clusters to maintain DNS resolution consistency. This allows CoreDNS to create EndpointSlices containing IPs of Pods matching the headless service's selector, adhering to namespace sameness.
- ExternalName services cannot be exported.
-
Conflict Resolution
- Only one ServiceImport is created when the same service is exported from multiple clusters (i.e., we do not create per-cluster ServiceImport objects).
- Warnings are issued when mixing headless and non-headless services.
- Conflicts are communicated via ServiceExport Conditions.
-
Scalability and Error Handling
- Efficient EndpointSlice updates minimize inter-cluster communication overhead.
- Validation failures are reflected in ServiceExport status conditions.
- A retry mechanism should handle transient resource creation or update issues.
-
(Optional) Monitoring and Debugging
- Resources are labeled appropriately for easier traceability and debugging.
- Events are recorded for state changes.
- Metrics can be exported to monitor the MCS controller’s performance and potential bottlenecks.
Architectural Considerations
-
Sveltos's Role
- Detection of ServiceExports.
- (Optional) Monitoring of ServiceExports.
- Automatic creation and cleanup of EndpointSlice, derived Service, and ServiceImport objects.
- State synchronization across clusters.
-
Namespace Sameness
- The same namespace structure must exist across clusters to avoid conflicts.
- Proper synchronization of namespace resources is assumed (integrations with Sveltos’s CRDs or other tooling).
Thank you @kahirokunn
Let me summarise it to see if I got it.
- When a ServiceExport is created in ClusterA: . Sveltos gets the corresponding Kubernetes Service and Endpoints and creates an EndpointSlice * this is supposed to be created in the other clusters part of the clusterSet * in your example port is https/443. Is this a constant? . Sveltos create a ServiceImport in the other clusters part of the clusterSet . if corresponding Kubernetes Service does not exist, Sveltos creates a Service with no selector and type ClusterSetIP in the other clusters part of the clusterSet
Is that correct?
Thank you @gianlucam76 Yes, that's correct! Let me clarify about the port:
The port (443/https in the example) is not a constant - it matches exactly with the port of the Kubernetes Service that has the same name as the ServiceExport.
Everything else in your summary is accurate!
Also, one important point to add: Even if you have three clusters in a clusterset and two of them create ServiceExports, there should only be one ServiceImport created.
Thank you. This is achievable already with Sveltos. Next week (I am pretty tight this week), I will prepare the Sveltos configuration for this and share. We can make sure we create only one ServiceImport by collecting all ServiceExports in the management cluster and then aggregating from there before posting ServiceImports in the other clusters.
I might need your help testing it though.
I have re-read the KEP and edited the implementation guide for the proposal, so we would like you to check it.
sorry for the delay. Good few issues to work on this week, so had to postpone this. Thank you.
I know you are always innovating. Thank you for your innovation!
To improve clarity, I have consolidated all the information—including the implementation guide—into the issue’s body for centralized reference.
Additionally, the aws-load-balancer-controller is planned to support EndpointSlices containing custom IPs. Once this feature is implemented, it should become feasible to register the EndpointSlice (aggregated by Sveltos in the Ingress resource) with the ALB's TargetGroup.
For further details, please refer to: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/4017 🙌