microk8s Debian - after ungraceful restart kube-apiserver and kubelite files are empty in /var/snap/microk8s/current/args/

Summary

I have a one node cluster running on Debian 11 with microk8s 1.25 After an ungraceful restart the cluster wasn't working, any command to microk8s return a connection was refused error. In the syslog I found the following error: debian microk8s.daemon-kubelite[2576]: Error: [--etcd-servers must be specified, service-account-issuer is a required flag, --service-account-signing-key-file and --service-account-issuer are required flags]

When looking for those settings, I saw they should be under /var/snap/microk8s/current/args/. Comparing to a clean installation, I saw that kube-apiserver and kubelite files were empty. After replacing them with the files from the clean install and restarting microk8s, the system was up and running again. What could have caused these files to be replaced/emptied out and how can I prevent such a situation again?

What Should Happen Instead?

After an ungraceful restart the system should be running

Reproduction Steps

Ungraceful restart

Jul 18 '23 05:07 naphtalidavies

Hey @naphtalidavies, thank you for reaching out.

We have not come across a similar scenario, were there any operations such as a snap refresh or some addon being enabled at the time of ungraceful restart? It might be that in this scenario the write/copy operations for these files could've been interrupted, although this is just a guess.

If there is a consistent way of reproducing, we can issue a bug fix for it. Other than that the usual warnings for updates/write operations

Many thanks!

Jul 18 '23 20:07 berkayoz

Hi, In the logs I can see now the following lines Jul 14 12:51:48 debian microk8s.daemon-apiserver-kicker[727]: CSR change detected. Restarting the cluster-agent Jul 14 12:51:48 debian microk8s.daemon-apiserver-kicker[1424]: error: error running snapctl: snap "microk8s" has "service-control" change in progress Jul 14 12:51:48 debian systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Main process exited, code=exited, status=1/FAILURE Jul 14 12:51:48 debian systemd[1]: snap.microk8s.daemon-apiserver-kicker.service: Failed with result 'exit-code'.

This is about the time of the error. Could some please explain this error? In addition, we have some problems in this environment with outer network, there are lots of error message in the log ntp sync. I've also got an error "debian snapd[686]: devicemgr.go:2300: no NTP sync after 10m0s, trying auto-refresh anyway" although I switched off the refresh

Jul 19 '23 05:07 naphtalidavies

Hi @naphtalidavies

For the snap refresh issue I opened a forum topic in https://forum.snapcraft.io/t/no-ntp-sync-trying-auto-refresh-anyway/36093. The snappy people will get back to us. It is worth mentioning how you disabled the refreshes. What exactly commands did you use.

For the empty files, I would like to know what was the reason for the ungraceful restarts? Is it possible the node run out of disk?

On the error: error running snapctl: snap "microk8s" has "service-control" error, the microk8s.daemon-apiserver-kicker service runs a reconciliation loop. In that loop it detected that there was an IP/network change and it had to reconfigure the K8s services but it failed.

Jul 24 '23 08:07 ktsakalozos

Hi, Thanks for your reply For snap refresh, I posted on their forum as well - we do sudo snap refresh --hold Emtpy files - there was no disk issue the was a full power shutdown On the error - there was an IP change but some time before the power cut, can't recall how long before

Jul 26 '23 08:07 naphtalidavies

Hi @naphtalidavies, could you share the full logs for the apiserver-kicker via microk8s inspect? Particularly we're interested in knowing whether systemd restarted the apiserver-kicker service or not because as seen from your current logs, it seems not to, but I tried on the latest 1.25 build and the service restarted after exiting.

Aug 14 '23 09:08 sachinkumarsingh092

Hi, We do not have the environment or the logs any more Closing the issue Thanks for the help

Aug 31 '23 05:08 naphtalidavies

Hi, @sachinkumarsingh092 @berkayoz I have a similar problem too,

I deployed an OVF file with microk8s running on a VM Immediately after the deployment and power-on the host, I power off the VM After that, I power on the VM again and then microk8s does not run,

From a check I made, the three files were emptied: kube-apiserver, kubelite files are empty in /var/snap/microk8s/current/args/

It happens consistently every time the machine is powered off as soon as it is powered on- (powered on with microk8s running)

inspection-report-20240125_105802.tar.gz

@sachinkumarsingh092 - From the inspection, we can see that the apiserver-kicker restarted as you expected

Who is the process responsible for these files? who is writing/overriding to them? Thank you

Jan 22 '24 13:01 shoshi-revivo

Hi, Is there any update related to it?

Jan 24 '24 07:01 shoshi-revivo

I just had this happen on two different nodes on two consecutive days. As reported, both the kube-apiserver and kubelite files were empty.

Jan 30 '24 21:01 john-terrell

I just had this happen on two different nodes on two consecutive days. As reported, both the kube-apiserver and kubelite files were empty.

The only way I could restore the nodes was to uninstall/reinstall microk8s.

Jan 30 '24 21:01 john-terrell

Hi, when I copy and paste those files manually and stop-start microk8s - it works. but it happens again on an ungraceful shutdown.

Jan 31 '24 07:01 shoshi-revivo

I'm curious, why are these configuration files set to empty? If these files are only read when microk8s starts, why are there other processes opening these files? Maybe this is the reason why these configuration files are emptyed?

Jul 31 '24 02:07 jackywu