sveltos icon indicating copy to clipboard operation
sveltos copied to clipboard

Feature Request: Provide an MCS Controller Implementation Using Sveltos’s Event Framework

Open kahirokunn opened this issue 10 months ago • 7 comments

Proposal

We propose a feature (or guideline documentation) illustrating how to implement an MCS (Multi-Cluster Services) controller using Sveltos’s Event Framework.

The solution adheres to KEP-1645, following the principle of namespace sameness. It manages ClusterIP, Headless, and LoadBalancer/NodePort services consistently, while explicitly disallowing ExternalName services.

Background

The Kubernetes MCS API standardizes how services can be exported from one cluster and discovered in others. By pairing this with Sveltos’s Event Framework, we can fully automate the service export → derived service + endpointslice + serviceimport pipeline. This removes the burden of manually provisioning these resources across multiple clusters.

Benefits

  • Fully Automated Creation: Eliminates manual steps for provisioning MCS resources across clusters.
  • Unified Approach for Multiple Service Types: ClusterIP, NodePort, and LoadBalancer services are all converted consistently (with optional special handling for headless).
  • Clean Resource Ownership: OwnerReferences let Kubernetes handle garbage collection automatically, simplifying lifecycle management.
  • DNS-Friendly for Headless Services: CoreDNS can resolve these exported headless services, thanks to the EndpointSlice objects.
  • Extensible Design: Sveltos’s Event Framework can be extended to track additional resource states or integrate with other multi-cluster components if needed.

By following the strategies outlined above and implementing them in Sveltos’s Event Framework, platform engineers can reliably export services from any source cluster and consume them with minimal configuration overhead. This significantly accelerates multi-cluster use cases—whether for high availability, traffic optimization, or cross-environment integrations—without inventing a proprietary approach.

MCS Controller Implementation Guide Using Sveltos

Overview

This guide details how to automate the implementation of the Kubernetes Multi-Cluster Services (MCS) API using Sveltos's Event Framework.
Based on KEP-1645 and the principle of namespace sameness, we present accurate conversion patterns for each service type.

In this revised document, we introduce the concept of creating derived Services named derived-$hashServiceExport, where $hash is computed from the ServiceExport name. Refer to the following implementation for hashing: https://github.com/kubernetes-sigs/mcs-api/blob/b4f72b8c11b640b049a2c247994a2de3eb0dda75/pkg/controllers/common.go#L39-L43

We also create one EndpointSlice for each source cluster associated with the ServiceExport, named derived-$hash-$clusterId. For clarity, the label multicluster.kubernetes.io/service-name: <clusterset service name> is added to both the EndpointSlice and the derived Service. Additionally, we establish OwnerReferences from the ServiceImport to the Service, and from the Service to the EndpointSlice.

Processing Patterns by Service Type

For reference: https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/1645-multi-cluster-services-api/README.md#clusterset-service-behavior-expectations

1. ClusterIP / LoadBalancer / NodePort Services

These service types can be handled with the same processing pattern. Below is an example:

Original Service in Source Cluster (ClusterID: cluster-a)

apiVersion: v1
kind: Service
metadata:
  name: web-service
  namespace: default
spec:
  type: ClusterIP  # or LoadBalancer or NodePort
  selector:
    app: web
  ports:
    - name: http
      port: 80
      targetPort: 8080

ServiceExport in Source Cluster (ClusterID: cluster-a)

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: web-service
  namespace: default

ClusterIP Service generated in cluster-a and cluster-b

apiVersion: v1
kind: Service
metadata:
  name: derived-$hashServiceExport
  namespace: default
  labels:
    multicluster.kubernetes.io/service-name: web-service
    multicluster.kubernetes.io/service-imported: "true"
    app.kubernetes.io/managed-by: sveltos
  ownerReferences:
  - apiVersion: multicluster.k8s.io/v1alpha1
    kind: ServiceImport
    name: web-service
    # other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
spec:
  type: ClusterIP
  selector:  # Selector is maintained based on namespace sameness
    app: web
  ports:
    - name: http
      port: 80
      targetPort: 8080

ServiceImport generated in cluster-a and cluster-b

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
metadata:
  name: web-service
  namespace: default
  annotations:
    multicluster.kubernetes.io/derived-service: derived-$hashServiceExport
spec:
  type: ClusterSetIP
  ports:
    - name: http
      port: 80
      protocol: TCP
  ips:
    - "10.96.0.1"  # Cluster IP assigned to the derived Service. eg. kubectl get svc derived-$hashServiceExport -o yaml | yq '.spec.clusterIp'
status:
  clusters:
  - cluster: cluster-a

EndpointSlice generated in cluster-a and cluster-b

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: derived-$hashServiceExport-cluster-a
  namespace: default
  labels:
    kubernetes.io/service-name: derived-$hashServiceExport
    multicluster.kubernetes.io/service-name: web-service
    cluster.x-k8s.io/cluster-name: cluster-a
    endpointslice.kubernetes.io/managed-by: sveltos
  ownerReferences:
  - apiVersion: multicluster.k8s.io/v1alpha1
    kind: ServiceImport
    name: web-service
    # other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
  - name: http
    protocol: TCP
    port: 80
endpoints:
  - addresses:
      - "10.0.1.1"
    conditions:
      ready: true
    nodeName: node-a
  - addresses:
      - "10.0.1.2"
    conditions:
      ready: true
    nodeName: node-b
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: derived-$hashServiceExport-cluster-b
  namespace: default
  labels:
    kubernetes.io/service-name: derived-$hashServiceExport
    multicluster.kubernetes.io/service-name: web-service
    cluster.x-k8s.io/cluster-name: cluster-b
    endpointslice.kubernetes.io/managed-by: sveltos
  ownerReferences:
  - apiVersion: multicluster.k8s.io/v1alpha1
    kind: ServiceImport
    name: web-service
    # other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
  - name: http
    protocol: TCP
    port: 80
endpoints:
  - addresses:
      - "10.0.1.3"
    conditions:
      ready: true
    nodeName: node-a
  - addresses:
      - "10.0.1.4"
    conditions:
      ready: true
    nodeName: node-b

2. Headless Services (clusterIP: None)

https://github.com/coredns/multicluster/blob/49f47d950355f793d656aec8a6d198daf1d888b1/multicluster.go#L347-L381

Headless services require special handling, typically relying on DNS-based service discovery instead of a VIP (Virtual IP). Initially, one might assume an MCS controller only needs to create ServiceImport objects; however, since CoreDNS uses EndpointSlices as a record source (reference: https://github.com/coredns/multicluster/blob/49f47d950355f793d656aec8a6d198daf1d888b1/multicluster.go#L347-L381), the controller must also create EndpointSlices for each source cluster for correct DNS-based service discovery.

Original Service in Source Cluster (ClusterId: cluster-a)

apiVersion: v1
kind: Service
metadata:
  name: stateful-service
  namespace: default
spec:
  clusterIP: None
  selector:
    app: stateful
  ports:
    - name: http
      port: 80
      targetPort: 8080

ServiceExport in Source Cluster (ClusterId: cluster-a)

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: stateful-service
  namespace: default

ServiceImport generated in cluster-a and cluster-b

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceImport
metadata:
  name: stateful-service
  namespace: default
  annotations:
    multicluster.kubernetes.io/derived-service: derived-$hashServiceExport
spec:
  type: Headless
  ports:
    - name: http
      port: 80
      protocol: TCP
status:
  clusters:
  - cluster: cluster-a

EndpointSlice generated in cluster-a and cluster-b

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: derived-$hashServiceExport-cluster-a
  namespace: default
  labels:
    kubernetes.io/service-name: stateful-service
    multicluster.kubernetes.io/service-name: stateful-service
    cluster.x-k8s.io/cluster-name: cluster-a
    endpointslice.kubernetes.io/managed-by: sveltos
  ownerReferences:
  - apiVersion: multicluster.k8s.io/v1alpha1
    kind: ServiceImport
    name: web-service
    # other fields (uid, controller, blockOwnerDeletion) required by OwnerReference
addressType: IPv4
ports:
  - name: http
    protocol: TCP
    port: 80
endpoints:
  - addresses:
      - "10.0.2.1"
    conditions:
      ready: true
    nodeName: node-c
  - addresses:
      - "10.0.2.2"
    conditions:
      ready: true
    nodeName: node-d

3. ExternalName Services

ExternalName services cannot be exported.

Original Service in Source Cluster (ClusterId: cluster-a)

apiVersion: v1
kind: Service
metadata:
  name: external-db
  namespace: default
spec:
  type: ExternalName
  externalName: db.example.com

ServiceExport (ClusterId: cluster-a)

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: external-db
  namespace: default

This ServiceExport will fail with the following status:

apiVersion: multicluster.k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: external-db
  namespace: default
status:
  conditions:
    - type: InvalidService
      status: "True"
      reason: UnsupportedServiceType
      message: "ExternalName services cannot be exported"

Key Implementation Points

  1. Selector Maintenance

    • Following namespace sameness principles, all imported Services maintain their selectors.
    • This enables Pods with matching labels in the importing cluster to be automatically added as service endpoints when applicable.
  2. (Optional) Derived Service Names

    • For each ServiceExport, create a derived Service named derived-$hashServiceExport where $hash is computed from the ServiceExport name.
    • One EndpointSlice per source cluster is created for every exported service. The EndpointSlice is named derived-$hashServiceExport-$clusterId.
  3. CoreDNS Integration for Headless Services

    • Although the IP address resolution for headless services is handled by CoreDNS, actual EndpointSlices are still required because they serve as data sources for DNS records.
    • Since there is no difference in EndpointSlices themselves between Headless and ClusterIP services, there is no need for conditional branching - you can process them exactly the same way as EndpointSlices for other services.
  4. Labeling for Multi-Cluster

    • EndpointSlices should include the label multicluster.kubernetes.io/service-name: <clusterset service name>, which is required by the KEP.
    • For better UX and resource management, we also recommend adding the same label to the derived Service.
  5. (Optional) OwnerReferences for Automatic Garbage Collection

    • The ServiceImport object is set as the owner of the derived Service.
    • The derived Service is set as the owner of the EndpointSlice.
    • This hierarchy allows you to track resource relationships efficiently and simplifies finalizer handling.
  6. Service Type Conversion

    • LoadBalancer and NodePort services are converted to ClusterIP in the derived Services.
    • For headless services (clusterIP: None), create corresponding headless services in target clusters to maintain DNS resolution consistency. This allows CoreDNS to create EndpointSlices containing IPs of Pods matching the headless service's selector, adhering to namespace sameness.
    • ExternalName services cannot be exported.
  7. Conflict Resolution

    • Only one ServiceImport is created when the same service is exported from multiple clusters (i.e., we do not create per-cluster ServiceImport objects).
    • Warnings are issued when mixing headless and non-headless services.
    • Conflicts are communicated via ServiceExport Conditions.
  8. Scalability and Error Handling

    • Efficient EndpointSlice updates minimize inter-cluster communication overhead.
    • Validation failures are reflected in ServiceExport status conditions.
    • A retry mechanism should handle transient resource creation or update issues.
  9. (Optional) Monitoring and Debugging

    • Resources are labeled appropriately for easier traceability and debugging.
    • Events are recorded for state changes.
    • Metrics can be exported to monitor the MCS controller’s performance and potential bottlenecks.

Architectural Considerations

  1. Sveltos's Role

    • Detection of ServiceExports.
    • (Optional) Monitoring of ServiceExports.
    • Automatic creation and cleanup of EndpointSlice, derived Service, and ServiceImport objects.
    • State synchronization across clusters.
  2. Namespace Sameness

    • The same namespace structure must exist across clusters to avoid conflicts.
    • Proper synchronization of namespace resources is assumed (integrations with Sveltos’s CRDs or other tooling).

kahirokunn avatar Jan 14 '25 06:01 kahirokunn