kamaji icon indicating copy to clipboard operation
kamaji copied to clipboard

Enable topology spread constraints on `tcp` deployment

Open bsctl opened this issue 3 years ago • 3 comments
trafficstars

Feature description

Support the topology spread constraints to control how tcp replicas are spread across the admin cluster among failure-domains such as zones, racks, hosts, and other user-defined topology domains. This helps to achieve a more robust high availability of the Tenant Control Planes as well as efficient resource utilisation.

A simple proposal:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: tenant-00
  namespace: default
spec:
  controlPlane:
    deployment:
      replicas: 3
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: tcp      
...

According to the definition above, the pods running the tcp will have:

    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: tcp

Configure topology spread constraints by assigning the topology key label topology.kubernetes.io/zone to the Kamaji admin cluster nodes hosting the tenants' tcp pods:

kubectl get nodes --show-labels

NAME              STATUS   ROLES                  AGE   VERSION   LABELS
kamaji-infra-00   Ready    <none>                 15h   v1.23.9   topology.kubernetes.io/zone=zone-a
kamaji-infra-01   Ready    <none>                 15h   v1.23.9   topology.kubernetes.io/zone=zone-b
kamaji-infra-02   Ready    <none>                 15h   v1.23.9   topology.kubernetes.io/zone=zone-c

bsctl avatar Aug 06 '22 15:08 bsctl

To achieve pods deployment according to topology constraints, we can set the constraints at cluster level by creating a global scheduler configuration:

apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: DoNotSchedule
          defaultingType: List

Since topology constraints are specific of the admin cluster hosting the tcp and not of the specific tcp, this seems a more wise option. @prometherion

bsctl avatar Aug 08 '22 18:08 bsctl

Since topology constraints are specific of the admin cluster hosting the tcp and not of the specific tcp, this seems a more wise option.

This would be applied to all the Pods in the cluster, and not for the TenantControlPlane ones, isn't it?

I would suggest adding the Deployment topologySpreadConstraint proposed here since wouldn't imply the use of a global scheduler configuration.

prometherion avatar Aug 09 '22 07:08 prometherion

This would be applied to all the Pods in the cluster, and not for the TenantControlPlane ones, isn't it?

yes, it's a global behaviour unless the specific pod defines its own topologySpreadConstraint. For sure, having tcp its own setting will be more flexible solution.

bsctl avatar Aug 09 '22 08:08 bsctl

    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: tcp

Question on this: the matchLabels should be known in advance by the Cluster Administrator. If we want to make this feature totally transparent, I'd say we could replicate the same keys (maxSkew, topologyKey, whenUnsatisfiable, minDomains, nodeAffinityPolicy, nodeTaintsPolicy) omitting labelSelector that will be computed by the Kamaji operator.

prometherion avatar Aug 29 '22 19:08 prometherion

omitting labelSelector that will be computed by the Kamaji operator

probably this might sound an override of the general topologySpreadConstraints feature, let say it is responsibility of the admin to set the label properly. What's your thought?

bsctl avatar Aug 30 '22 13:08 bsctl

I think the feature is precious, I'm just saying that when deploying a Tenant Control Plane the Cluster Administrator should know in advance the Pod labels used because otherwise, the spread constraint wouldn't work.

Actually, all the Control Plane Pods got the label kamaji.clastix.io/soot=$tenantControlPlane.name}, and it's non-intuitive for a newcomer: what I can suggest is the following approach:

  1. if no label selector in the topologySpreadConstraints is provided, using the default Kamaji labels
  2. otherwise, use the input ones.

With that said, the Cluster Administrator can also play with the additional metadata for the Deployment to add different labels, so there's no mandatory need to use the default ones.

I can start working on this.

prometherion avatar Aug 30 '22 15:08 prometherion