etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Warning DNSConfigForming kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is:

Open bensch98 opened this issue 3 years ago • 1 comments

What happened?

When I restore my Kubernetes cluster from an etcd backup, everything works fine except CoreDNS has now warnings. CoreDNS is somehow still running, but does not change in Ready state:

default        test-deployment-7968d6985c-567nv   1/1     Running   0          5m46s
kube-flannel   kube-flannel-ds-gdtmd              1/1     Running   2          33m
kube-system    coredns-558bd4d5db-4hrqc           0/1     Running   0          33m
kube-system    coredns-558bd4d5db-qf6tl           0/1     Running   0          33m
kube-system    etcd-blub                          1/1     Running   0          34m
kube-system    kube-apiserver-blub                1/1     Running   0          34m
kube-system    kube-controller-manager-blub       1/1     Running   0          11m
kube-system    kube-proxy-58vxx                   1/1     Running   0          11m
kube-system    kube-scheduler-blub                1/1     Running   0          34m

I have only one master node which I use to test and understand etcdctl's functionalities.

What did you expect to happen?

I expect CoreDNS to behave the same as of the time of the backup.

How can we reproduce it (as minimally and precisely as possible)?

I backed up my K8s cluster like this:

mkdir -p $HOME/k8s-backups/backup-1
cd $HOME/k8s-backups/backup-1
mkdir backup-certs backup-etcd

cp -r /etc/kubernetes/pki backup-certs

etcdctl snapshot save backup-etcd/snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

And I restore it like this:

IP=192.168.0.7
# shutdown running cluster
kubeadm reset --force
rm -rf /etc/cni/net.d
rm -rf $HOME/.kube/config

# restore from backup
cd $HOME/k8s-backups/backup-1
cp backups-certs/pki/ca.key /etc/kubernetes/pki
cp backup-certs/pki.ca.crt /etc/kubernetes/pki
# restore etcd
cd backup-etcd
etcdctl snapshot restore snapshot.db \
  --name m1 \
  --initial-cluster m1=http://$IP:2380 \
  --initial-advertise-peer-urls http://$IP:2380
mv default.etcd/member /var/lib/etcd/

# restore from existing etcd
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd

Anything else we need to know?

coredns configmap

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-14T08:09:18Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "221"
  uid: ff28ccb3-c985-4bd7-901e-8401d24bcfd9

coredns pod description

Name:                 coredns-558bd4d5db-4hrqc
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 blub/192.168.0.7
Start Time:           Wed, 14 Sep 2022 10:09:53 +0200
Labels:               k8s-app=kube-dns
                      pod-template-hash=558bd4d5db
Annotations:          <none>
Status:               Running
IP:                   10.244.0.5
IPs:
  IP:           10.244.0.5
Controlled By:  ReplicaSet/coredns-558bd4d5db
Containers:
  coredns:
    Container ID:  docker://ac269a83224be4f8eb1724dd5b88e58cc74cfb301857848c987f9a9bcd1af9b7
    Image:         k8s.gcr.io/coredns/coredns:v1.8.0
    Image ID:      docker-pullable://k8s.gcr.io/coredns/coredns@sha256:cc8fb77bc2a0541949d1d9320a641b82fd392b0d3d8145469ca4709ae769980e
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 14 Sep 2022 10:31:59 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbqvs (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-dbqvs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  29m (x4 over 29m)       default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         29m                     default-scheduler  Successfully assigned kube-system/coredns-558bd4d5db-4hrqc to blub 
  Normal   Pulled            29m                     kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
  Normal   Created           29m                     kubelet            Created container coredns
  Normal   Started           29m                     kubelet            Started container coredns
  Warning  DNSConfigForming  10m (x20 over 29m)      kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a02:2457:30c:101::11 2a02:2457:10c:101::53 195.234.128.139
  Warning  FailedMount       7m23s                   kubelet            MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
  Warning  NetworkNotReady   7m21s (x3 over 7m24s)   kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  Normal   Pulled            7m18s                   kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
  Normal   Created           7m18s                   kubelet            Created container coredns
  Normal   Started           7m18s                   kubelet            Started container coredns
  Warning  DNSConfigForming  6m13s (x5 over 7m19s)   kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 2a02:2457:30c:101::11 2a02:2457:10c:101::53 195.234.128.139
  Warning  Unhealthy         2m23s (x31 over 7m17s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

Etcd version (please run commands below)

$ etcd --version
# paste output here
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
# paste output here
etcdctl version: 3.5.4
API version: 3.5

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here
{"level":"warn","ts":"2022-09-14T10:47:36.851+0200","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000172380/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

bensch98 avatar Sep 14 '22 08:09 bensch98

It seems that there is something wrong with the CNI, and not related to etcd. I suggest you to raise a question in the kubeadm community

ahrtr avatar Sep 15 '22 22:09 ahrtr