kops feat: add custom topologySpreadConstraints support to coredns

Context on the change

Recently at WildlifeStudios, we had a short temporary outage in CoreDNS in one of our clusters, since a single node crashed, and all CoreDNS pods were running on it.

So, to avoid that, we would like to set our own topologySpreadContraints parameters according to our user case.

What does this PR do?

This PR adds support for customizing the field topologySpreadConstraints in the template of the CoreDNS add-on.

Dec 06 '24 19:12 thiagoluiznunes

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: thiagoluiznunes / name: Thiago Luiz (5158908340ce1e4445a0f6b6c0be7dee88eaa1f7, 824192b9170804c645f791c3fafa06ab7bedc310)

Dec 06 '24 19:12 linux-foundation-easycla[bot]

Hi @thiagoluiznunes. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Dec 06 '24 19:12 k8s-ci-robot

/ok-to-test

Dec 06 '24 20:12 hakman

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Dec 10 '24 19:12 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Dec 10 '24 19:12 k8s-ci-robot

Hey folks, could you review the PR? cc @hakman @justinsb

Dec 11 '24 03:12 thiagoluiznunes

@justinsb Could you please take a look? /lgtm /assign @justinsb

Dec 16 '24 08:12 hakman

/retest

Jan 13 '25 13:01 thiagoluiznunes

/retest

@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel

Could you also share what value you would like to use in your case and why?

Jan 13 '25 19:01 hakman

@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-cni-flannel

In response to this:

/retest

@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel

Could you also share what value you would like to use in your case and why?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 13 '25 19:01 k8s-ci-robot

/retest

@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel

Could you also share what value you would like to use in your case and why?

@hakman when you ask about my case, is it about the /retest or the feature? Enabling the customization of the topologySpreadConstraints is necessary because we have some small clusters at Wildlife with 2 or 3 nodes of workload. Eventually, the coredns pods are scheduled in the same node due to the small quantity, and this scenario is dangerous if a failure occurs in that node. Does it make sense?

Jan 13 '25 20:01 thiagoluiznunes

/retest

@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel Could you also share what value you would like to use in your case and why?

@hakman when you ask about my case, is it about the /retest or the feature? Enabling the customization of the topologySpreadConstraints is necessary because we have some small clusters at Wildlife with 2 or 3 nodes of workload. Eventually, the coredns pods are scheduled in the same node due to the small quantity, and this scenario is dangerous if a failure occurs in that node. Does it make sense?

Hi @hakman , I wanted to follow up and see if you have any updates on this discussion.

Jan 17 '25 18:01 thiagoluiznunes

@thiagoluiznunes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kops-e2e-gce-cni-calico	824192b9170804c645f791c3fafa06ab7bedc310	link	unknown	`/test pull-kops-e2e-gce-cni-calico`
pull-kops-e2e-gce-cni-kindnet	824192b9170804c645f791c3fafa06ab7bedc310	link	unknown	`/test pull-kops-e2e-gce-cni-kindnet`
pull-kops-e2e-k8s-aws-amazonvpc-u2404	824192b9170804c645f791c3fafa06ab7bedc310	link	true	`/test pull-kops-e2e-k8s-aws-amazonvpc-u2404`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Jan 20 '25 02:01 k8s-ci-robot

Hi @hakman, We would like to set the following values as the default in the topologySpreadConstraints:

topologySpreadConstraints:
- labelSelector:
    matchLabels:
      k8s-app: kube-dns
  maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
- labelSelector:
    matchLabels:
      k8s-app: kube-dns
  maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: DoNotSchedule

Jan 29 '25 12:01 thiagoluiznunes

What you are trying makes sense, but I don't think it will work due to maxSkew.

Have you tried using maxSurge and maxUnavailable? https://kops.sigs.k8s.io/operations/rolling-update/

Jan 30 '25 14:01 hakman

What you are trying makes sense, but I don't think it will work due to maxSkew.

Have you tried using maxSurge and maxUnavailable? https://kops.sigs.k8s.io/operations/rolling-update/

@hakman So, these fields have no effect, since there is no kops rollout, but Karpenter prunes the nodes that fail, and this ends up affecting CoreDNS. Therefore, using maxSurge or maxUnavailable would not make a difference.

Feb 10 '25 12:02 thiagoluiznunes

Hi @hakman . Do you have updates about the discussion?

Feb 21 '25 14:02 thiagoluiznunes

@thiagoluiznunes We had lots of discussions on this topic (thanks @justinsb). The conclusion is that we would prefer to change the default to whenUnsatisfiable: DoNotSchedule. Would that be ok with you?

Feb 21 '25 18:02 hakman

@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?

Feb 25 '25 14:02 thiagoluiznunes

@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?

We prefer to not add another config option here, or another dependency on the topologySpreadConstraints type.

Mar 03 '25 07:03 hakman

@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?

We prefer to not add another config option here, or another dependency on the topologySpreadConstraints type.

Hi @hakman, Could we change the default value to DoNotSchedule? This adjustment would resolve our issue. What are your thoughts? I can update the merge request accordingly. Could it apply to both the zone and the host fields?

Mar 19 '25 13:03 thiagoluiznunes

Hi @hakman, After discussing with my team regarding the recent office hours, we raised a point about the inability to customize the configuration. Essentially, we identified a new corner case related to our infrastructure. We have two scenarios: small clusters and large clusters. If we set the parameters DoNotSchedule for both host and zone, it will solve the problem for small clusters, but it will increase costs for medium and large clusters. For instance, we have a production cluster with 11 CoreDNS pods running smoothly with the current topology set to ScheduleAnyway. However, if we change the parameter to DoNotSchedule, new nodes will be created unnecessarily just to fit the topology. It’s possible that other users could encounter the same scenario. What are your thoughts?

Apr 03 '25 12:04 thiagoluiznunes

Hi @hakman, After discussing with my team regarding the recent office hours, we raised a point about the inability to customize the configuration. Essentially, we identified a new corner case related to our infrastructure. We have two scenarios: small clusters and large clusters. If we set the parameters DoNotSchedule for both host and zone, it will solve the problem for small clusters, but it will increase costs for medium and large clusters. For instance, we have a production cluster with 11 CoreDNS pods running smoothly with the current topology set to ScheduleAnyway. However, if we change the parameter to DoNotSchedule, new nodes will be created unnecessarily just to fit the topology. It’s possible that other users could encounter the same scenario. What are your thoughts?

@hakman ping :D

Apr 29 '25 01:04 thiagoluiznunes

@thiagoluiznunes https://github.com/kubernetes/kops/pull/17472 change should add the desired behaviour to CoreDNS. Sorry for the delay. /close

Jul 04 '25 12:07 hakman

@hakman: Closed this PR.

In response to this:

@thiagoluiznunes https://github.com/kubernetes/kops/pull/17472 change should add the desired behaviour to CoreDNS. Sorry for the delay. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 04 '25 12:07 k8s-ci-robot