microk8s
microk8s copied to clipboard
Pods stuck in unknown state after reboot
I'm using latest/edge (with calico cni) and after rebooting the machine I'm getting all pods in Unknown state.
Logs of the calico node:
2020-08-27 07:46:50.152 [INFO][8] startup.go 290: Early log level set to info
2020-08-27 07:46:50.152 [INFO][8] startup.go 306: Using NODENAME environment for node name
2020-08-27 07:46:50.152 [INFO][8] startup.go 318: Determined node name: davigar15
2020-08-27 07:46:50.153 [INFO][8] startup.go 350: Checking datastore connection
2020-08-27 07:46:50.159 [INFO][8] startup.go 374: Datastore connection verified
2020-08-27 07:46:50.159 [INFO][8] startup.go 102: Datastore is ready
2020-08-27 07:46:50.170 [INFO][8] startup.go 652: Using autodetected IPv4 address on interface lxdbr0: 172.16.100.1/24
2020-08-27 07:46:50.170 [INFO][8] startup.go 715: No AS number configured on node resource, using global value
2020-08-27 07:46:50.170 [INFO][8] startup.go 171: Setting NetworkUnavailable to False
2020-08-27 07:46:50.191 [INFO][8] startup.go 764: found v6= in the kubeadm config map
2020-08-27 07:46:50.210 [INFO][8] startup.go 598: FELIX_IPV6SUPPORT is false through environment variable
2020-08-27 07:46:50.232 [INFO][8] startup.go 215: Using node name: davigar15
2020-08-27 07:46:50.274 [INFO][32] allocateip.go 144: Current address is still valid, do nothing currentAddr="10.1.245.64" type="vxlanTunnelAddress"
CALICO_NETWORKING_BACKEND is vxlan - no need to run a BGP daemon
Calico node started successfully
An interesting this if that calico is detecting the network used for LXD.
Following @ktsakalozos suggestions, I added this in /var/snap/microk8s/current/args/cni-network/cni.yaml and apply that spec.
- name: IP_AUTODETECTION_METHOD
value: "can-reach=192.168.0.0"
The calico node did not restart, so I kill it to force the restart. But it did not come up even with microk8s.stop && microk8s.start
This is the tarball generated by microk8s.inspect
Experiencing similar issues with dashboard here. Is microk8s inspect displaying any alert?
@zar3bski could you attach the microk8s inspect tarball? It is hard to say what may be wrong.
@davigar15 the attached inspection report seems corrupted.
@zar3bski could you attach the
microk8s inspecttarball? It is hard to say what may be wrong.
Here it is. inspection-report-20200908_153805.tar.gz
Since then, I also tried to remove the pods manually. It remained stuck on Pending since then
kubectl get -n kube-system all
NAME READY STATUS RESTARTS AGE
pod/coredns-588fd544bf-cmtch 0/1 Unknown 11 56d
pod/dashboard-metrics-scraper-59f5574d4-fnwf4 0/1 Unknown 9 55d
pod/hostpath-provisioner-75fdc8fccd-qtdz2 0/1 Unknown 11 56d
pod/kubernetes-dashboard-6d97855997-2nglv 0/1 Pending 0 9d
pod/metrics-server-c65c9d66-7tppz 0/1 Unknown 9 55d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.152.183.173 <none> 8000/TCP 55d
service/kube-dns ClusterIP 10.152.183.10 <none> 53/UDP,53/TCP,9153/TCP 56d
service/kubernetes-dashboard ClusterIP 10.152.183.73 <none> 443/TCP 55d
service/metrics-server ClusterIP 10.152.183.240 <none> 443/TCP 55d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 0/1 1 0 56d
deployment.apps/dashboard-metrics-scraper 0/1 1 0 55d
deployment.apps/hostpath-provisioner 0/1 1 0 56d
deployment.apps/kubernetes-dashboard 0/1 1 0 55d
deployment.apps/metrics-server 0/1 1 0 55d
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-588fd544bf 1 1 0 56d
replicaset.apps/dashboard-metrics-scraper-59f5574d4 1 1 0 55d
replicaset.apps/hostpath-provisioner-75fdc8fccd 1 1 0 56d
replicaset.apps/kubernetes-dashboard-6d97855997 1 1 0 55d
replicaset.apps/metrics-server-c65c9d66 1 1 0 55d
inspection-report-20200917_165856.tar.gz I also noticed something new
WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent
WARNING: Docker is installed.
File "/etc/docker/daemon.json" does not exist.
You should create it and add the following lines:
{
"insecure-registries" : ["localhost:32000"]
}
and then restart docker with: sudo systemctl restart docker
Building the report tarball
Tried both fixes but it did not change much
Facing same issue after restart of VM, the pods are in unknown state. Though microk8s status and inspect shows all running but pods are in unknown state. $ microk8s kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE hostpath-provisioner-5c65fbdb4f-n5b6q 0/1 Unknown 2 10d calico-kube-controllers-847c8c99d-vn9tq 0/1 Unknown 2 10d coredns-86f78bb79c-q86wm 0/1 Unknown 2 10d tiller-deploy-575fcb6dfd-ljj4v 0/1 Unknown 1 10d calico-node-8twws 1/1 Running 6 10d
Tried restarting service and stop and stop of microk8s service as well. Have attached microk8s inspect for reference inspection-report-20201005_133027.tar.gz
Thank you for you patience and apologies for the inconvenience this issue may have caused.
When the node starts it needs to invalidate IPs and update pods with new IPs. In kubelet logs you can see this call failing:
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: time="2020-10-05T13:30:09.146826904Z" level=info msg="StopPodSandbox for \"5bfb7fe3babb8b47c627bbe9c7b67d567a86abae2b47ed36b856ec571bb6c668\""
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: time="2020-10-05T13:30:09.146930076Z" level=info msg="Container to stop \"c002252db6a93d65b57330a0175059ae0520a730a90b7cef3efc77f01aeb6370\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: 2020-10-05 13:30:09.167 [ERROR][9700] customresource.go 136: Error updating resource Key=IPAMBlock(10-1-90-192-26) Name="10-1-90-192-26" Resource="IPAMBlocks" Value=&v3.IPAMBlock{TypeMeta:v1.TypeMeta{Kind:"IPAMBlock", APIVersion:"crd.projectcalico.org/v1"}, ObjectMeta:v1.ObjectMeta{Name:"10-1-90-192-26", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"1159920", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.IPAMBlockSpec{CIDR:"10.1.90.192/26", Affinity:(*string)(0xc00027e1b0), StrictAffinity:false, Allocations:[]*int{(*int)(0xc0003e10d0), (*int)(0xc0003e1100), (*int)(nil), (*int)(nil), (*int)(0xc0003e1110), (*int)(nil), (*int)(nil), (*int)(0xc0003e1120), (*int)(0xc0003e1140), (*int)(0xc0003e1130), (*int)(0xc0003e10f0), (*int)(nil), (*int)(nil), (*int)(0xc0003e1220), (*int)(nil), (*int)(0xc0003e1150), (*int)(nil), (*int)(0xc0003e10e0), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(0xc0003e11d0), (*int)(0xc0003e1250), (*int)(0xc0003e1240), (*int)(nil), (*int)(nil), (*int)(0xc0003e11c0), (*int)(0xc0003e1210), (*int)(nil), (*int)(0xc0003e1260), (*int)(0xc0003e11b0), (*int)(0xc0003e11a0), (*int)(nil), (*int)(nil), (*int)(0xc0003e1190), (*int)(nil), (*int)(0xc0003e1160), (*int)(0xc0003e1170), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(0xc0003e1290), (*int)(nil), (*int)(0xc0003e1280), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(0xc0003e12c0), (*int)(0xc0003e12a0), (*int)(nil), (*int)(nil), (*int)(nil), (*int)(0xc0003e12b0), (*int)(nil), (*int)(0xc0003e1200), (*int)(nil), (*int)(nil), (*int)(nil)}, Unallocated:[]int{51, 61, 52, 25, 59, 35, 49, 56, 48, 50, 12, 33, 3, 5, 24, 11, 18, 2, 19, 6, 20, 16, 14, 28, 40, 41, 42, 55, 43, 46, 57, 44, 39, 62, 32, 38, 63}, Attributes:[]v3.AllocationAttribute{v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e1e0), AttrSecondary:map[string]string{"node":"imsdev", "type":"vxlanTunnelAddress"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e230), AttrSecondary:map[string]string{"namespace":"kube-system", "node":"imsdev", "pod":"hostpath-provisioner-5c65fbdb4f-n5b6q"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e2a0), AttrSecondary:map[string]string{"namespace":"kube-system", "node":"imsdev", "pod":"calico-kube-controllers-847c8c99d-vn9tq"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e310), AttrSecondary:map[string]string{"namespace":"ims1", "node":"imsdev", "pod":"scscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e380), AttrSecondary:map[string]string{"namespace":"ims1", "node":"imsdev", "pod":"pcscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e3f0), AttrSecondary:map[string]string{"namespace":"controller-bbd82afa-8246-4004-8ae4-7c865129e0c2", "node":"imsdev", "pod":"modeloperator-7f85946d4-z9wwk"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e460), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"modeloperator-7f9967fb56-vbtjj"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e4d0), AttrSecondary:map[string]string{"namespace":"ims1", "node":"imsdev", "pod":"dns-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e540), AttrSecondary:map[string]string{"namespace":"controller-imsmicro", "node":"imsdev", "pod":"modeloperator-5dffd95c85-dvk2n"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e5b0), AttrSecondary:map[string]string{"namespace":"controller-bbd82afa-8246-4004-8ae4-7c865129e0c2", "node":"imsdev", "pod":"controller-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e620), AttrSecondary:map[string]string{"namespace":"metallb-system", "node":"imsdev", "pod":"controller-559b68bfd8-5hhlk"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e690), AttrSecondary:map[string]string{"namespace":"controller-imsmicro", "node":"imsdev", "pod":"controller-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e700), AttrSecondary:map[string]string{"namespace":"ims1", "node":"imsdev", "pod":"icscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e770), AttrSecondary:map[string]string{"namespace":"ims1", "node":"imsdev", "pod":"modeloperator-d565694b7-jhqq4"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e7e0), AttrSecondary:map[string]string{"namespace":"kube-system", "node":"imsdev", "pod":"coredns-86f78bb79c-q86wm"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e850), AttrSecondary:map[string]string{"namespace":"kube-system", "node":"imsdev", "pod":"tiller-deploy-575fcb6dfd-ljj4v"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e930), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"hss-operator-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027e9a0), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"icscf-operator-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ea10), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"mysql-operator-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ea80), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"hss-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027eaf0), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"pcscf-operator-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027eb60), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"icscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ebd0), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"scscf-operator-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ec40), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"mysql-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ecb0), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"pcscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ed20), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"scscf-0"}}, v3.AllocationAttribute{AttrPrimary:(*string)(0xc00027ed90), AttrSecondary:map[string]string{"namespace":"ims2", "node":"imsdev", "pod":"dns-0"}}}, Deleted:false}} error=context deadline exceeded
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: 2020-10-05 13:30:09.167 [ERROR][9700] ipam.go 1238: Error updating block '10.1.90.192/26': context deadline exceeded cidr=10.1.90.192/26 handle="k8s-pod-network.c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de"
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: 2020-10-05 13:30:09.167 [ERROR][9700] ipam_plugin.go 309: Failed to release address ContainerID="c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de" HandleID="k8s-pod-network.c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de" Workload="imsdev-k8s-dns--operator--0-eth0" error=context deadline exceeded
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: 2020-10-05 13:30:09.171 [ERROR][9690] utils.go 223: context deadline exceeded ContainerID="c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de"
Oct 05 13:30:09 imsdev microk8s.daemon-containerd[31205]: time="2020-10-05T13:30:09.176686468Z" level=error msg="StopPodSandbox for \"c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de\" failed" error="failed to destroy network for sandbox \"c5317a7ae8fc73bdce2d8ff68c16d45f90c822d28522e8cb502e5cca65e7d0de\": context deadline exceeded"
On the API server side we see the failed call to "admission.juju.is" to webhook. This webhook is supposed to intercept the REST API call and authorize it. However the pod of the webhook is hosted in the cluster so its IP is not correct so it cannot be found.
Oct 05 13:30:09 imsdev microk8s.daemon-apiserver[9995]: I1005 13:30:09.166278 9995 trace.go:205] Trace[700890020]: "Update" url:/apis/crd.projectcalico.org/v1/ipamblocks/10-1-90-192-26,user-agent:Go-http-client/2.0,client:10.45.28.23 (05-Oct-2020 13:29:35.164) (total time: 34001ms):
Oct 05 13:30:09 imsdev microk8s.daemon-apiserver[9995]: Trace[700890020]: [34.001614426s] [34.001614426s] END
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: I1005 13:30:10.189575 9995 trace.go:205] Trace[613559678]: "Call mutating webhook" configuration:juju-model-admission-controller-imsmicro,webhook:admission.juju.is,resource:crd.projectcalico.org/v1, Resource=ipamblocks,subresource:,operation:UPDATE,UID:a7df1e01-a19e-4c0b-8238-22ebaca9e472 (05-Oct-2020 13:30:06.191) (total time: 3998ms):
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: Trace[613559678]: [3.998434049s] [3.998434049s] END
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: W1005 13:30:10.189656 9995 dispatcher.go:170] Failed calling webhook, failing open admission.juju.is: failed calling webhook "admission.juju.is": Post "https://modeloperator.controller-imsmicro.svc:17071/k8s/admission/c589b8d8-c5fa-46aa-888a-ea1197c1ac82?timeout=4s": context deadline exceeded
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: E1005 13:30:10.189702 9995 dispatcher.go:171] failed calling webhook "admission.juju.is": Post "https://modeloperator.controller-imsmicro.svc:17071/k8s/admission/c589b8d8-c5fa-46aa-888a-ea1197c1ac82?timeout=4s": context deadline exceeded
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: I1005 13:30:10.189922 9995 trace.go:205] Trace[1120710352]: "GuaranteedUpdate etcd3" type:*unstructured.Unstructured (05-Oct-2020 13:29:36.189) (total time: 34000ms):
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: Trace[1120710352]: [34.00001823s] [34.00001823s] END
Oct 05 13:30:10 imsdev microk8s.daemon-apiserver[9995]: E1005 13:30:10.189989 9995 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}
We have a bug opened for this at https://bugs.launchpad.net/juju/+bug/1898718
As a temporary workaround you could use Juju 2.7 until this gets addressed.
@ktsakalozos Currently working on a fix for Juju to resolve this. Have updated lp bug.
Fixed committed in Juju. Will be available in 2.8.6
@ktsakalozos still facing the same issue, what might be the case?
Issue still exists in 1.22.2, any ideas on how to fix this besides resetting/reinstalling?
@arnitkun could you please attach a microk8s inspect tarball?
@ktsakalozos I shall do it the next time it happens, apparently after another restart everything was good again.
HI all,
I also got same issue. but check checking inspection report i found
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
on the k8s folder, sometimes kubectl gives same error randomly. I could not think any reason why is that. any tips to check further?
Not sure whether related: https://github.com/canonical/microk8s/issues/3293
I don't have juju installed in my machine. After reboot, all the pods status is unknown. Is there any workaround?
my pods also went to unknown mode, I guess after rebooting this happened for both of my servers which has microk8s and centOS7, I post issue here also:
https://github.com/canonical/microk8s/issues/3545 @ktsakalozos Any Idea how can I resolve it? don't think it is related to docker or containerd or kernel version since it was working perfectly for 7-8 monthes!
Any Idea how can I resolve it? don't think it is related to docker or containerd or kernel version since it was working perfectly for 7-8 monthes!
Is it possible that the reboot caused the system to start from another kernel?
Any Idea how can I resolve it? don't think it is related to docker or containerd or kernel version since it was working perfectly for 7-8 monthes!
Is it possible that the reboot caused the system to start from another kernel?
tnx for ur reply: No, I don't think so, since there is only one kernel on both of my Centos 7 severs which is kernel-3.10.0-1160.el7.x86_64 ; one server is connected to the internet and another is completely isolated and I have done nothing on it and after reboot unexpectedly this error happened! don't guess it is related to kernel and containerd version incompatibility due to runc vulnerability measures ...the reaseon I think so is that my docker on the same machine that microk8s (or better say containerd that running inside microk8s ) gives the error: "can't copy bootstrap data to pipe: write init-p: broken pipe" can create container e.g :
#docker run -itd busybox:latest #docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7328b5736817 busybox:latest "sh" 3 seconds ago Up 3 seconds flamboyant_sanderson
the environement is the same, so if container gets created with docker on the same machine with 3.xx kernel and docker-ce and containerd version as following then we can eliminate the solution that says: "with upgrade of kernel version or downgrade docker & containerd version the problem will get solved!" I think this problem is related to microk8s and containerd inside it, deams like this issue is for Ubuntu as well: https://github.com/canonical/microk8s/issues/531
#uname -sr Linux 3.10.0-1160.49.1.el7.x86_64 #docker version Client: Version: 18.09.0 API version: 1.39 Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:48:22 2018 OS/Arch: linux/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 18.09.0 API version: 1.39 (minimum version 1.12) Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:19:08 2018 OS/Arch: linux/amd64 Experimental: false #yum info containerd Installed Packages Name : containerd.io Arch : x86_64 Version : 1.6.9 Release : 3.1.el7 Size : 112 M Repo : installed From repo : docker-ce-stable Summary : An industry-standard container runtime URL : https://containerd.io
#rpm -qa kernel kernel-3.10.0-1160.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64
#rpm -qa | grep -i kernel kernel-tools-libs-3.10.0-1160.49.1.el7.x86_64 kernel-tools-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.el7.x86_64 kernel-headers-3.10.0-1160.49.1.el7.x86_64 kernel-3.10.0-1160.49.1.el7.x86_64
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.