talos icon indicating copy to clipboard operation
talos copied to clipboard

[1.8.0] coredns does not start with talosctl cluster create

Open alongwill opened this issue 1 year ago • 3 comments

Bug Report

Description

When creating a cluster with talosctl and docker, coredns does not start

talosctl cluster create
View logs
talosctl cluster create
validating CIDR and reserving IPs
generating PKI and tokens
creating network talos-default
creating controlplane nodes
creating worker nodes
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for no diagnostics: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all control plane static pods to be running: OK
◱ waiting for all control plane components to be ready: expected number of pods for kube-scheduler to be 1, got 0
waiting for all control plane components to be ready: OK
waiting for all k8s nodes to report ready: OK
waiting for kube-proxy to report ready: OK
◳ waiting for coredns to report ready: no ready pods found for namespace "kube-system" and label selector "k8s-app=kube-dns"
context deadline exceeded

Logs

coredns Pods are not ready.

View pod statuses
kubectl get pods -A
NAMESPACE     NAME                                                   READY   STATUS    RESTARTS      AGE
kube-system   coredns-68d75fd545-bwtqm                               0/1     Running   0             12m
kube-system   coredns-68d75fd545-m27zf                               0/1     Running   0             12m
kube-system   kube-apiserver-talos-default-controlplane-1            1/1     Running   0             12m
kube-system   kube-controller-manager-talos-default-controlplane-1   1/1     Running   2 (13m ago)   11m
kube-system   kube-flannel-hlghk                                     1/1     Running   0             12m
kube-system   kube-flannel-zrgsw                                     1/1     Running   0             12m
kube-system   kube-proxy-6kfvc                                       1/1     Running   0             12m
kube-system   kube-proxy-fsxf9                                       1/1     Running   0             12m
kube-system   kube-scheduler-talos-default-controlplane-1            1/1     Running   2 (13m ago)   11m

coredns Pod logs

View coredns pod logs
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[2008292454]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (02-Oct-2024 09:42:02.185) (total time: 30003ms):
Trace[2008292454]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30002ms (09:42:32.187)
Trace[2008292454]: [30.003012847s] [30.003012847s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[219620330]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (02-Oct-2024 09:42:03.343) (total time: 30003ms):
Trace[219620330]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30003ms (09:42:33.346)
Trace[219620330]: [30.003374722s] [30.003374722s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1005043231]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (02-Oct-2024 09:42:18.890) (total time: 30004ms):
Trace[1005043231]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30004ms (09:42:48.895)
Trace[1005043231]: [30.00420643s] [30.00420643s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
View coredns pod events
kubectl describe -n kube-system po coredns-68d75fd545-bwtqm
...
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Warning  FailedScheduling        16m                  default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling        16m                  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled               15m                  default-scheduler  Successfully assigned kube-system/coredns-68d75fd545-bwtqm to talos-default-worker-1
  Warning  FailedCreatePodSandBox  15m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "73940f0a8b5724ae1fe13c750a886d2a0e9c39ac254ee82c6cc08462dee1cb3f": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
  Normal   Pulling                 15m                  kubelet            Pulling image "registry.k8s.io/coredns/coredns:v1.11.3"
  Normal   Pulled                  15m                  kubelet            Successfully pulled image "registry.k8s.io/coredns/coredns:v1.11.3" in 5.508s (5.508s including waiting). Image size: 16948420 bytes.
  Normal   Created                 15m                  kubelet            Created container coredns
  Normal   Started                 15m                  kubelet            Started container coredns
  Warning  Unhealthy               41s (x103 over 15m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

Environment

  • Talos version: [talosctl version --nodes <problematic nodes>]
talosctl version -n 10.5.0.2
Client:
	Tag:         v1.8.0
	SHA:         5cc935f7
	Built:
	Go version:  go1.22.7
	OS/Arch:     darwin/arm64
Server:
	NODE:        10.5.0.2
	Tag:         v1.8.0
	SHA:         5cc935f7
	Built:
	Go version:  go1.22.7
	OS/Arch:     linux/arm64
	Enabled:     RBAC
  • Kubernetes version: [kubectl version --short]
kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.1
  • Platform: Apple Silicon
arch
arm64
  • Docker runtime:
docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:48 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

alongwill avatar Oct 02 '24 09:10 alongwill