microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

Pods stuck in unknown or ImagePullBackOff state after reboot

Open alahawash opened this issue 4 years ago • 10 comments

Whenever I reboot my machine all pods go either to Unknown or ImagePullBackOff state. Deleting the pod causes the new pod to be stuck on ContainerCreating state where it keeps pulling the image forever. Here's my microk8s inspect output:

Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-control-plane-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

Building the report tarball
  Report tarball is at /var/snap/microk8s/1791/inspection-report-20201114_232232.tar.gz

The workaround I have is to disable registry and enable it again, but then I have to push all the images again and recreate the pods, which is very frustrating. Any clues?

Inspection Report: inspection-report-20201114_232232.tar.gz

alahawash avatar Nov 14 '20 19:11 alahawash

Hi what happens when you do microk8s stop then microk8s start? Does it also go away?

balchua avatar Nov 17 '20 13:11 balchua

Yes, same thing as reboot.

alahawash avatar Nov 17 '20 14:11 alahawash

Usually it takes some bit of time to get all the pods running. Most of your images are using the latest tag.
Which means it will pull it from the registry. Since your registry is not yet ready or running kubernetes will hold off and then retry with exponential backoff. Try waiting for some few minutes.

balchua avatar Nov 17 '20 22:11 balchua

It stays the same even after waiting for hours.

alahawash avatar Nov 19 '20 07:11 alahawash

@alahawash the images should be located in your host's path /var/snap/microk8s/current/common/default-storage Just checking with you. You are not using multi nodes right?

balchua avatar Nov 22 '20 00:11 balchua

We are seeing the same issue. Sometimes we have to reboot a couple of times before it works again.

lilvinz avatar Dec 02 '20 20:12 lilvinz

@balchua actually the images are located in /var/snap/microk8s/common/default-storage, and no I'm not using multi nodes.

alahawash avatar Dec 14 '20 18:12 alahawash

did anyone ever get anywhere with this? I am seeing the same issue frequently after a reboot. Starting and stopping microk8s usually results in different pods becoming stuck in containerCreating. I can pull the images manually using microk8s ctr images pull <image> but this doesn't resolve the state of the stuck pods.

wamphlett avatar Oct 01 '21 09:10 wamphlett

We have not been able to resolve it. Restarting microk8s a couple of times usually helps.

lilvinz avatar Oct 04 '21 11:10 lilvinz

On v1.21.12 I am getting this problem after the cluster gets about a month of uptime without rebooting or restarting anything.

Images from external registries are pulled fine, but images from the local plugin registry get stuck in pulling image status.

Doing microk8s stop and microk8s start had no effect.

The only thing that worked was

sudo systemctl reboot snap.microk8s.daemon-kubelet

but that's a pretty big reboot and takes a long time to recover.

Will check if this happens with v1.23.6

edemen avatar May 10 '22 22:05 edemen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 05 '23 23:04 stale[bot]