k3s
k3s copied to clipboard
Allow manual override or scaling of CoreDNS replica count
Is your feature request related to a problem? Please describe. When deploying a multi-node cluster, CoreDNS is hard-coded with 1 replica, this can introduce concerns for HA, scalability and cluster DNS outages during deployment upgrades.
Describe the solution you'd like Allow mitigation of this by deploying or scaling CoreDNS with a minimum of 2 replicas.
Describe alternatives you've considered
Manually deploying CoreDNS or coredns-autoscaler with min: 2
Defining topologySpreadConstraints and pod disruption budget on the coredns manifest would be a good start: https://github.com/k3s-io/k3s/blob/58315fe135101f1a06bf439687c2be9da692648f/manifests/coredns.yaml
Thanks @marcbachmann, potentially another option is the preventSinglePointFailure with the autoscaler. This does add another deployment however.
Currently evaluating k3s to replace our kubeadm setup. So far i found mostly problems regarding high-availability. This issue is the most prominent to me because when the node with the DNS pod goes down, the whole cluster basically becomes useless for our workloads.
I couldn't find any open configuration so far. Did anyone work on a HA setup for k3s that took into account when we controller-node goes down? Is this even of interest to the project? I am wondering because this issue has been open for a while and it seems necessary for a multi-controller setup to me? Are patches welcome for this?
K3s doesn't offer much flexibility in the coredns configuration at the moment, since we ship a flat manifest and not a HelmChart that can be customized. I think the most common thing for folks to do is copy the packaged CoreDNS manifest, make their changes, and then restart k3s with --disable=coredns so that only their modified configuration is used.
@zimmski one option is to leave defaults in place, and add a manifest file containing the coredns-autoscaler deployment to manage the scaling of coredns automatically.
- https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#enablng-dns-horizontal-autoscaling
As an example, I've used this before with k3s.
Thanks for your suggestions. An auto-scaler is i think the best option. What i am curious and want to understand is why k3s does have this HA changes in the first place. Is this a matter of missing contributions, or should the default of the project be to use as few resources as possible?
why k3s does have this HA changes in the first place
I'm not sure what you're asking here.
should the default of the project be to use as few resources as possible
Yes, as the project readme says the goal of K3s is to be:
Lightweight Kubernetes. Production ready, easy to install, half the memory, all in a binary less than 100 MB.
I've lookup failures in my 3 node cluster which can be resolved by changing replicas to 3. So why not use a daemonset for coredns? Or do I miss something?
Running coredns on every node would incur unnecessary overhead on a distro that is focused on resource-constrained nodes. If you are experiencing DNS failures when the coredns pod is not running on the same node as your workload, you are most likely experiencing drops in CNI traffic between nodes. This is commonly caused by blocked vxlan ports, or issues with tx checksum offload corrupting packets. I would recommend fixing that, rather than just scheduling more replicas.
Makes sense, I checked all interfaces on all nodes and cni0 has no drops or errors. But I've seen output discards on flannel.1 on one node, every full hour. Maybe this bug? https://github.com/flannel-io/flannel/issues/1009
Could you please help me here?
@brightdroid please open another issue to track your problem; lets keep this one focused on the initial ask of being able to set the replica count.
As of https://github.com/k3s-io/k3s/pull/6552 none of our packaged manifests should specify a replica count, so you should be able to scale it without having it reset when the servers restart. I see that the coredns replica count was commented out a while back, so I suspect that this has actually been resolved for a while.
Validated on release-1.25 branch with commit 457e5e7379821db3feed65548fb7678345a73828
and master branch with commit b5d39df9294627cbfa3081acb92e2be54f02b0d6
Environment Details
Infrastructure
- [x] Cloud (AWS)
- [ ] Hosted Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04 LTS Cluster Configuration:
3 servers, 1 agent Config.yaml:
N/A Additional files
N/A
Testing Steps
- Install k3s and join all nodes
- Scale and edit
local-path-provisioner
andcoredns
- Reboot the nodes or restart the k3s service on all the nodes Replication Results:
-
local-path-provisioner
updated upon restart to only have 1 replica - coredns kept its replicas, but maintained revision history so there would be multiple replicasets
$ k get deploy,rs -n kube-system -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/coredns 2/2 2 2 163m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns
deployment.apps/local-path-provisioner 1/1 1 1 163m local-path-provisioner rancher/local-path-provisioner:v0.0.23 app=local-path-provisioner
deployment.apps/metrics-server 1/1 1 1 163m metrics-server rancher/mirrored-metrics-server:v0.6.1 k8s-app=metrics-server
deployment.apps/traefik 1/1 1 1 162m traefik rancher/mirrored-library-traefik:2.9.4 app.kubernetes.io/instance=traefik-kube-system,app.kubernetes.io/name=traefik
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/coredns-57557ff85b 0 0 0 44m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns,pod-template-hash=57557ff85b
replicaset.apps/coredns-597584b69b 0 0 0 163m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns,pod-template-hash=597584b69b
replicaset.apps/coredns-9996b5795 2 2 2 39m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns,pod-template-hash=9996b5795
replicaset.apps/local-path-provisioner-79f67d76f8 1 1 1 163m local-path-provisioner rancher/local-path-provisioner:v0.0.23 app=local-path-provisioner,pod-template-hash=79f67d76f8
replicaset.apps/metrics-server-5c8978b444 1 1 1 163m metrics-server rancher/mirrored-metrics-server:v0.6.1 k8s-app=metrics-server,pod-template-hash=5c8978b444
replicaset.apps/traefik-bb69b68cd 1 1 1 162m traefik rancher/mirrored-library-traefik:2.9.4 app.kubernetes.io/instance=traefik-kube-system,app.kubernetes.io/name=traefik,pod-template-hash=bb69b68cd
Validation Results:
-
local-path-provisioner
maintained the data from the scale and edit (2 replicas in my case) - coredns kept its replicas AND there were no empty replicasets present:
$ k get deploy,rs -n kube-system -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/coredns 2/2 2 2 162m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns
deployment.apps/local-path-provisioner 2/2 2 2 162m local-path-provisioner rancher/local-path-provisioner:v0.0.23 app=local-path-provisioner
deployment.apps/metrics-server 1/1 1 1 162m metrics-server rancher/mirrored-metrics-server:v0.6.2 k8s-app=metrics-server
deployment.apps/traefik 1/1 1 1 161m traefik rancher/mirrored-library-traefik:2.9.4 app.kubernetes.io/instance=traefik-kube-system,app.kubernetes.io/name=traefik
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/coredns-9996b5795 2 2 2 26m coredns rancher/mirrored-coredns-coredns:1.9.4 k8s-app=kube-dns,pod-template-hash=9996b5795
replicaset.apps/local-path-provisioner-79f67d76f8 2 2 2 162m local-path-provisioner rancher/local-path-provisioner:v0.0.23 app=local-path-provisioner,pod-template-hash=79f67d76f8
replicaset.apps/metrics-server-5f9f776df5 1 1 1 162m metrics-server rancher/mirrored-metrics-server:v0.6.2 k8s-app=metrics-server,pod-template-hash=5f9f776df5
replicaset.apps/traefik-66c46d954f 1 1 1 161m traefik rancher/mirrored-library-traefik:2.9.4 app.kubernetes.io/instance=traefik-kube-system,app.kubernetes.io/name=traefik,pod-template-hash=66c46d954f