microk8s Packaging microk8s with a deployment in a VM image - results in very long startup time due to (most likely) hostname change

I'm trying to create a VM image that includes a single node (none HA) microk8s with some simple app deployed on it. When the hostname stays the same it takes k8s around a minute and a half starting from when k8s is up (microk8s status --wait-ready returns) and till the node is actually serving the app.

When I spin up a VM based on that image with a different hostname (like most times will happen...) it takes k8s more than 8 minutes to get out of its limbo and start serving the application. The hostname change is done automatically by the cloud-init script before microk8s is even started (I stop microk8s and only then create the image).

I've tried adding --hostname-override to the kubelet config as part of my cloud-init script (and before starting microk8s) - but that doesn't seem to help and if you look at the attached logs it's still looking for the previous hostname node. Seems to me like a bug that microk8s doesn't detect correctly the hostname change and takes forever (more than 8 minutes...) to get out of this loop.

When I run microk8s.kubectl get nodes - I see 2 node records. The old hostname (stays in NotReady state) and the new hostname. The new hostname appears as ready - but as I mentioned - only 8 minutes after startup (after microk8s status --wait-ready returns) microk8s actually starts to serve the application.

Any idea how can I overcome this long delay? I thought as a last resort disabling the hostname change of the cloud-init script - but that isn't a nice solution at all.

Inspect attached - inspection-report-20220508_103506.tar.gz

Thanks in advance for any assistance!

May 08 '22 10:05 bondib

Hello @bondib.

According to this answer and this answer on Stack Overflow, changing node names is rather an uncommon use case for k8s.

As of microk8s, you could try running microk8s reset and re-deploying your application on the new VM, although I'm not sure this will help much time-wise.

Also note that in single node deployments microk8s will detect network changes and reconfigure all k8s services, but on multinode clusters changing IPs/hostnames will surely cause nodes to become unreachable.

May 10 '22 11:05 lferran

Hello @lferran, so --hostname-override is not actually supported for multinode clusters?

Sep 08 '22 07:09 ThanKarab

Hi @ThanKarab

so --hostname-override is not actually supported for multinode clusters?

--hostname-override is supported for multinode clusters. You can configure it like this on each host:

echo '--hostname-override='<hostname>' | sudo tee -a /var/snap/microk8s/current/args/kubelet

NOTE: The discussion in the previous messages does not revolve around setting the hostname reported by kubelet manually using the --hostname-override flag, but rather how MicroK8s handles situations where the hostname of the machine changes (e.g. because of DHCP configs, or due to installing MicroK8s on a base image).

Also, for @bondib, not sure if this issue is still relevant, but my suggestion for this would be to include the MicroK8s snap in the base image, and then install it when the machine starts up for the first time. This would look like this:

# during image build (for channel, look at `snap info microk8s`)
snap download microk8s --channel 1.25 --target-directory /opt --basename microk8s
snap download core18 --target-directory /opt --basename core18

# during initial image boot
snap ack /opt/core18.assert && snap install /opt/core18.snap
snap ack /opt/microk8s.assert && snap install /opt/microk8s.snap --classic

Sep 08 '22 08:09 neoaggelos

Had a similar issue while changing a machine hostname after microk8s installation. Issuing sudo microk8s reset did NOT help, because it said a multi-node cluster cannot be reset 🙃 What did help is this:

# have the node leave the cluster
sudo microk8s leave

# (re)start the cluster
sudo microk8s start
sudo microk8s status --wait-ready

Takes a few minutes for everything to settle (even after the wait-ready), but even if you enable microk8s services like dns straight after, it will ultimately come ready after a bit 👌

It's as if changing the machine hostname does force microk8s to create a new node, which works in my case, but the older node is stuck in limbo (calico-node in Terminating for ever).

Nov 27 '22 15:11 clorichel

It's as if changing the machine hostname does force microk8s to create a new node, which works in my case, but the older node is stuck in limbo (calico-node in Terminating for ever).

This happens because calico-node is a daemonset. The hostname-override will in fact have the effect of kubelet registering again under the new name, but the previous node registration will stay there.

From the Kubernetes control plane point of view, this looks like a node that is not reconnecting, so it waits before deleting all daemonset pods on the "old" node.

Nov 29 '22 23:11 neoaggelos

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oct 28 '23 08:10 stale[bot]

microk8s microk8s copied to clipboard

Packaging microk8s with a deployment in a VM image - results in very long startup time due to (most likely) hostname change

microk8s
microk8s copied to clipboard