microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

Pull after 1h doesn't work

Open cetiberiojr opened this issue 5 years ago • 13 comments

Hi,

I'm having a problem to pull a huge image from my private registry.

Doing describe as I start my deployment I can see the image is being pull... this is happening for a long time (as I don't have it in my local cache, its ok...) but after some time, I cannot see the Events anymore and the POD is stuck at "ContainerCreating".

1 - Describe after some time of pull run:

Events: Type Reason Age From Message


Normal Scheduled default-scheduler Successfully assigned default/baremetal-7f8b566785-wmwzg to master-node Normal Pulling 42m kubelet, master-node Pulling image "xxx:latest"

Ps.: I didn't share the image name because of company policy.

2 - Describe after more than 1h:

Events:

3 - POD status:

NAME READY STATUS RESTARTS AGE

baremetal-7f8b566785-wmwzg 0/1 ContainerCreating 0 84m

Is important to note if I have a smaller image, I can pull and the POD starts without any problem (I have the endpoint and user/pass configured correctly)

Tarball of inspect attached.

inspection-report-20200413_195412.tar.gz

cetiberiojr avatar Apr 13 '20 23:04 cetiberiojr

Hi @cetiberiojr I think you will need to configure containerd to be able to resolve the private registry you have. Please edit the configuration file under /var/snap/microk8s/current/args/containerd-template.toml and restart microk8s with microk8s.stop; microk8s.start.

ktsakalozos avatar Apr 14 '20 06:04 ktsakalozos

Hi @cetiberiojr I think you will need to configure containerd to be able to resolve the private registry you have. Please edit the configuration file under /var/snap/microk8s/current/args/containerd-template.toml and restart microk8s with microk8s.stop; microk8s.start.

Hi @ktsakalozos, I already configured... I know it's working because other images from the same registry are being downloaded ok. The problem is this image is taking more than 1h to download and stops without any error/message.

As a workaround, I'm doing the microk8s.ctr images pull xxx to download the image.

cetiberiojr avatar Apr 14 '20 15:04 cetiberiojr

I am not sure if this is related, but I am running into a similar issue. Pods are stuck in the ContainerCreating state and remain this way indefinitely. Additionally, events seem to stop after the Pulling event. Other pods pull their images just fine; the problem appears to be intermittent.

One strange thing I noticed is the local registry (which I enabled via the addons) stops responding periodically. If I try and curl the catalog, I get the following:

foo@ubuntu1910:~$ curl -X GET http://localhost:32000/v2/_catalog
curl: (7) Failed to connect to localhost port 32000: Connection refused

An identical request a few moments later lists the images in my registry. The pods are also no longer stuck after this point (but sometimes fail again a short while later).

louislang avatar Apr 28 '20 05:04 louislang

I have a similar issue as LouisLang, pods are stuck on ContainerCreating, but I'm able to go to http://localhost:32000/v2/_catalog with the browser and it responds correctly. I had to restart microk8s a few times for the containers to be created.

gfbett avatar May 07 '20 13:05 gfbett

Please attach the tarball produced by the microk8s.inspect command.

ktsakalozos avatar May 07 '20 14:05 ktsakalozos

I need to clean some info that I cannot share before uploading the tarball. I'll create a minimal reproduction and attach the report. In the meantime in the logs I see this: May 07 10:50:27 mycomputer microk8s.daemon-kubelet[27650]: E0507 10:50:27.664527 27650 kuberuntime_image.go:50] Pull image "localhost:32000/myservice:1.0.0-SNAPSHOT" failed: rpc error: code = Unknown desc = failed to resolve image "localhost:32000/myservice:1.0.0-SNAPSHOT": no available registry endpoint: failed to do request: Head "http://localhost:32000/v2/myservice/manifests/1.0.0-SNAPSHOT": dial tcp 127.0.0.1:32000: connect: no route to host

I think is related to the VPN that sets a man in the middle proxy. It seems that the issue appears when I connect/disconnect from the VPN

gfbett avatar May 07 '20 16:05 gfbett

I also had this problem and fought it for about an hour - both from the local registry and from remote (initially failed on a init container pull: busybox).

Upon inspect, I did initially get the warning that:

WARNING:  Docker is installed. 
File "/etc/docker/daemon.json" does not exist. 
You should create it and add the following lines: 
{
    "insecure-registries" : ["localhost:32000"] 
}

but this problem seemed to appear out of nowhere on a system that had been working fine for weeks. Editing that file and running sudo systemctl restart docker did not fix the problem. Nor did a host machine reboot, however microk8s stop then microk8s start did fix the problem.

I have attached 4 different inspection tarballs - the earlier timestamped ones were when the problem was occurring, the last one is after a restart and the problem resolved itself.

inspection-report-20210202_162608.tar.gz inspection-report-20210202_162748.tar.gz inspection-report-20210202_163258.tar.gz inspection-report-20210202_165700.tar.gz

kevin-david avatar Feb 02 '21 17:02 kevin-david

I'm having this issue with an image from docker hub with 48MB. Sometimes even with busybox... It seems to be randomly coming and going. inspection-report-20210609_152643.tar.gz

I have docker installed and had the insecure-registries file. If I removed the file, MicroK8s wouldn't start. However, there was no registry running on localhost:32000.

Tried to enable registry on microk8s and add the registry to the docker config. Restarted everything. And still, the node is stuck in NotReady state.

MatTerra avatar Jun 09 '21 15:06 MatTerra

Hello, Did you correct this problem, i've got the same problem to, pod stuck in container creating...

BATTLEROYALXS avatar Jul 21 '21 21:07 BATTLEROYALXS

I ended up giving up on microk8s and switching to minikube. Haven't had a problem since. Required writing a systemd unit which looks like this:

################################################################################
# A template for minikube systemd service
# Can be put somewhere like /etc/systemd/system/minikube.service
# Then enabled/started with `systemctl enable` and `systemctl start `
################################################################################
[Unit]
Description=minikube

Wants=network.target
After=syslog.target network-online.target

[Service]
Type=simple
User=<some_user>
Group=docker
ExecStart=/usr/bin/env minikube start --driver=docker
ExecStop=/usr/bin/env minikube stop
Restart=on-failure
RestartSec=10
KillMode=process
Type=oneshot
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

kevin-david avatar Jul 21 '21 21:07 kevin-david

Yes, I did. I had to add the registry plugin, if not mistaken. @BATTLEROYALXS

MatTerra avatar Jul 21 '21 22:07 MatTerra

minikube in production ?? alright , finaly i've stayed on micro, the problem was from me wrong settings in cpu conf, pretty strange cause no clearly logs was showing it.

BATTLEROYALXS avatar Jul 21 '21 23:07 BATTLEROYALXS

Hello, I experience a similar issue with one of my images which is more than 10GB. The other images are pulled properly and everything works as expected.

It takes hours to pull this one image and in the meantime, the machine becomes very slow, opening a new terminal window takes a very long time, kubectl does not always respond, sometimes responding with connection to the server was refused. I also noticed that pod with hostpath-provisioner keeps restarting until the image is pulled. I experience it both on my local and remote machines with local and remote registry.

I tried to set requests and limits for the container itself, but I don't think it affects the container creation process. Anyway, the CPU and memory usage seems to be normal. The problem appears for Ubuntu 22.04.1 and Microk8s v1.24.4 and v1.25.0. I did not experience this problem when using Ubuntu 20.04 and Microk8s v1.23.10.

inspection-report-20220915_094924.tar.gz

adrihanu avatar Sep 15 '22 08:09 adrihanu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 21 '23 16:08 stale[bot]