etcd
etcd copied to clipboard
Warning DNSConfigForming kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is:
What happened?
When I restore my Kubernetes cluster from an etcd backup, everything works fine except CoreDNS has now warnings. CoreDNS is somehow still running, but does not change in Ready state:
default test-deployment-7968d6985c-567nv 1/1 Running 0 5m46s
kube-flannel kube-flannel-ds-gdtmd 1/1 Running 2 33m
kube-system coredns-558bd4d5db-4hrqc 0/1 Running 0 33m
kube-system coredns-558bd4d5db-qf6tl 0/1 Running 0 33m
kube-system etcd-blub 1/1 Running 0 34m
kube-system kube-apiserver-blub 1/1 Running 0 34m
kube-system kube-controller-manager-blub 1/1 Running 0 11m
kube-system kube-proxy-58vxx 1/1 Running 0 11m
kube-system kube-scheduler-blub 1/1 Running 0 34m
I have only one master node which I use to test and understand etcdctl's functionalities.
What did you expect to happen?
I expect CoreDNS to behave the same as of the time of the backup.
How can we reproduce it (as minimally and precisely as possible)?
I backed up my K8s cluster like this:
mkdir -p $HOME/k8s-backups/backup-1
cd $HOME/k8s-backups/backup-1
mkdir backup-certs backup-etcd
cp -r /etc/kubernetes/pki backup-certs
etcdctl snapshot save backup-etcd/snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
And I restore it like this:
IP=192.168.0.7
# shutdown running cluster
kubeadm reset --force
rm -rf /etc/cni/net.d
rm -rf $HOME/.kube/config
# restore from backup
cd $HOME/k8s-backups/backup-1
cp backups-certs/pki/ca.key /etc/kubernetes/pki
cp backup-certs/pki.ca.crt /etc/kubernetes/pki
# restore etcd
cd backup-etcd
etcdctl snapshot restore snapshot.db \
--name m1 \
--initial-cluster m1=http://$IP:2380 \
--initial-advertise-peer-urls http://$IP:2380
mv default.etcd/member /var/lib/etcd/
# restore from existing etcd
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
Anything else we need to know?
coredns configmap
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2022-09-14T08:09:18Z"
name: coredns
namespace: kube-system
resourceVersion: "221"
uid: ff28ccb3-c985-4bd7-901e-8401d24bcfd9
coredns pod description
Name: coredns-558bd4d5db-4hrqc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: blub/192.168.0.7
Start Time: Wed, 14 Sep 2022 10:09:53 +0200
Labels: k8s-app=kube-dns
pod-template-hash=558bd4d5db
Annotations: <none>
Status: Running
IP: 10.244.0.5
IPs:
IP: 10.244.0.5
Controlled By: ReplicaSet/coredns-558bd4d5db
Containers:
coredns:
Container ID: docker://ac269a83224be4f8eb1724dd5b88e58cc74cfb301857848c987f9a9bcd1af9b7
Image: k8s.gcr.io/coredns/coredns:v1.8.0
Image ID: docker-pullable://k8s.gcr.io/coredns/coredns@sha256:cc8fb77bc2a0541949d1d9320a641b82fd392b0d3d8145469ca4709ae769980e
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 14 Sep 2022 10:31:59 +0200
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbqvs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-dbqvs:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 29m (x4 over 29m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 29m default-scheduler Successfully assigned kube-system/coredns-558bd4d5db-4hrqc to blub
Normal Pulled 29m kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
Normal Created 29m kubelet Created container coredns
Normal Started 29m kubelet Started container coredns
Warning DNSConfigForming 10m (x20 over 29m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a02:2457:30c:101::11 2a02:2457:10c:101::53 195.234.128.139
Warning FailedMount 7m23s kubelet MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
Warning NetworkNotReady 7m21s (x3 over 7m24s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Normal Pulled 7m18s kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
Normal Created 7m18s kubelet Created container coredns
Normal Started 7m18s kubelet Started container coredns
Warning DNSConfigForming 6m13s (x5 over 7m19s) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a02:2457:30c:101::11 2a02:2457:10c:101::53 195.234.128.139
Warning Unhealthy 2m23s (x31 over 7m17s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Etcd version (please run commands below)
$ etcd --version
# paste output here
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.16.15
Go OS/Arch: linux/amd64
$ etcdctl version
# paste output here
etcdctl version: 3.5.4
API version: 3.5
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
{"level":"warn","ts":"2022-09-14T10:47:36.851+0200","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000172380/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
It seems that there is something wrong with the CNI, and not related to etcd. I suggest you to raise a question in the kubeadm community