feat: add custom topologySpreadConstraints support to coredns
Context on the change
Recently at WildlifeStudios, we had a short temporary outage in CoreDNS in one of our clusters, since a single node crashed, and all CoreDNS pods were running on it.
So, to avoid that, we would like to set our own topologySpreadContraints parameters according to our user case.
What does this PR do?
This PR adds support for customizing the field topologySpreadConstraints in the template of the CoreDNS add-on.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: thiagoluiznunes / name: Thiago Luiz (5158908340ce1e4445a0f6b6c0be7dee88eaa1f7, 824192b9170804c645f791c3fafa06ab7bedc310)
Hi @thiagoluiznunes. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/ok-to-test
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hey folks, could you review the PR? cc @hakman @justinsb
@justinsb Could you please take a look? /lgtm /assign @justinsb
/retest
/retest
@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel
Could you also share what value you would like to use in your case and why?
@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-cni-flannel
In response to this:
/retest
@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel
Could you also share what value you would like to use in your case and why?
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/retest
@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel
Could you also share what value you would like to use in your case and why?
@hakman when you ask about my case, is it about the /retest or the feature? Enabling the customization of the topologySpreadConstraints is necessary because we have some small clusters at Wildlife with 2 or 3 nodes of workload. Eventually, the coredns pods are scheduled in the same node due to the small quantity, and this scenario is dangerous if a failure occurs in that node. Does it make sense?
/retest
@thiagoluiznunes Don't worry, it's a known issue. /override pull-kops-e2e-cni-flannel Could you also share what value you would like to use in your case and why?
@hakman when you ask about my case, is it about the /retest or the feature? Enabling the customization of the topologySpreadConstraints is necessary because we have some small clusters at Wildlife with 2 or 3 nodes of workload. Eventually, the coredns pods are scheduled in the same node due to the small quantity, and this scenario is dangerous if a failure occurs in that node. Does it make sense?
Hi @hakman , I wanted to follow up and see if you have any updates on this discussion.
@thiagoluiznunes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-kops-e2e-gce-cni-calico | 824192b9170804c645f791c3fafa06ab7bedc310 | link | unknown | /test pull-kops-e2e-gce-cni-calico |
| pull-kops-e2e-gce-cni-kindnet | 824192b9170804c645f791c3fafa06ab7bedc310 | link | unknown | /test pull-kops-e2e-gce-cni-kindnet |
| pull-kops-e2e-k8s-aws-amazonvpc-u2404 | 824192b9170804c645f791c3fafa06ab7bedc310 | link | true | /test pull-kops-e2e-k8s-aws-amazonvpc-u2404 |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Hi @hakman, We would like to set the following values as the default in the topologySpreadConstraints:
topologySpreadConstraints:
- labelSelector:
matchLabels:
k8s-app: kube-dns
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- labelSelector:
matchLabels:
k8s-app: kube-dns
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
What you are trying makes sense, but I don't think it will work due to maxSkew.
Have you tried using maxSurge and maxUnavailable?
https://kops.sigs.k8s.io/operations/rolling-update/
What you are trying makes sense, but I don't think it will work due to
maxSkew.Have you tried using
maxSurgeandmaxUnavailable? https://kops.sigs.k8s.io/operations/rolling-update/
@hakman So, these fields have no effect, since there is no kops rollout, but Karpenter prunes the nodes that fail, and this ends up affecting CoreDNS. Therefore, using maxSurge or maxUnavailable would not make a difference.
Hi @hakman . Do you have updates about the discussion?
@thiagoluiznunes We had lots of discussions on this topic (thanks @justinsb). The conclusion is that we would prefer to change the default to whenUnsatisfiable: DoNotSchedule. Would that be ok with you?
@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?
@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?
We prefer to not add another config option here, or another dependency on the topologySpreadConstraints type.
@hakman I would like to understand why the custom option for topologySpread is not enabled. It could potentially be beneficial for us (host: DoNotSchedule, zone: DotNotSchedule), but enforcing this configuration might introduce issues in unanticipated corner cases. What drawbacks would Kops face by incorporating this custom configuration? I believe that adopting this custom option would primarily provide us with advantages. Does it make sense?
We prefer to not add another config option here, or another dependency on the
topologySpreadConstraintstype.
Hi @hakman,
Could we change the default value to DoNotSchedule? This adjustment would resolve our issue. What are your thoughts? I can update the merge request accordingly. Could it apply to both the zone and the host fields?
Hi @hakman,
After discussing with my team regarding the recent office hours, we raised a point about the inability to customize the configuration. Essentially, we identified a new corner case related to our infrastructure. We have two scenarios: small clusters and large clusters. If we set the parameters DoNotSchedule for both host and zone, it will solve the problem for small clusters, but it will increase costs for medium and large clusters. For instance, we have a production cluster with 11 CoreDNS pods running smoothly with the current topology set to ScheduleAnyway. However, if we change the parameter to DoNotSchedule, new nodes will be created unnecessarily just to fit the topology. It’s possible that other users could encounter the same scenario. What are your thoughts?
Hi @hakman, After discussing with my team regarding the recent office hours, we raised a point about the inability to customize the configuration. Essentially, we identified a new corner case related to our infrastructure. We have two scenarios: small clusters and large clusters. If we set the parameters
DoNotSchedulefor both host and zone, it will solve the problem for small clusters, but it will increase costs for medium and large clusters. For instance, we have a production cluster with 11 CoreDNS pods running smoothly with the current topology set toScheduleAnyway. However, if we change the parameter to DoNotSchedule, new nodes will be created unnecessarily just to fit the topology. It’s possible that other users could encounter the same scenario. What are your thoughts?
@hakman ping :D
@thiagoluiznunes https://github.com/kubernetes/kops/pull/17472 change should add the desired behaviour to CoreDNS. Sorry for the delay. /close
@hakman: Closed this PR.
In response to this:
@thiagoluiznunes https://github.com/kubernetes/kops/pull/17472 change should add the desired behaviour to CoreDNS. Sorry for the delay. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.