k0sctl icon indicating copy to clipboard operation
k0sctl copied to clipboard

k0sctl install fails because of cert issue

Open FallingSnow opened this issue 2 years ago • 5 comments

Operating system (Control/Worker): Alpine 3.15.0

$ k0sctl apply --config k0sctl.yaml
...
INFO ==> Running phase: Install workers  
INFO [ssh] [redacted-worker-node]:22: validating api connection to https://10.0.1.1:6443 
INFO [ssh] [redacted-control-node]:22: generating token 
INFO [ssh] [redacted-worker-node]:22: writing join token 
INFO [ssh] [redacted-worker-node]:22: installing k0s worker 
INFO [ssh] [redacted-worker-node]:22: starting service 
INFO [ssh] [redacted-worker-node]:22: waiting for node to become ready 
INFO * Running clean-up for phase: Initialize the k0s cluster 
INFO * Running clean-up for phase: Install workers 
ERRO apply failed - log file saved to /home/ayrton/.k0sctl/cache/k0sctl.log 
FATA failed on 1 hosts:
 - [ssh] [redacted-worker-node]:22: [ssh] [redacted-control-node]:22: node [redacted-worker-node] status not reported as ready

[redacted-worker-node]'s /var/log/messages:

...
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.info supervise-daemon[11670]: Child command line: /usr/local/bin/k0s worker --kubelet-extra-args=--node-ip=10.0.2.1 --token-file=/etc/k0s/k0stoken 
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.warn supervise-daemon[11447]: /usr/local/bin/k0s, pid 11670, exited with return code 1
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.info supervise-daemon[11697]: Child command line: /usr/local/bin/k0s worker --kubelet-extra-args=--node-ip=10.0.2.1 --token-file=/etc/k0s/k0stoken 
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.warn supervise-daemon[11447]: /usr/local/bin/k0s, pid 11697, exited with return code 1
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.info supervise-daemon[11726]: Child command line: /usr/local/bin/k0s worker --kubelet-extra-args=--node-ip=10.0.2.1 --token-file=/etc/k0s/k0stoken
...
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.warn supervise-daemon[11447]: /usr/local/bin/k0s, pid 11726, exited with return code 1
Mar 22 20:39:15 a8-a1-59-98-e1-33 daemon.warn supervise-daemon[11447]: respawned "/usr/local/bin/k0s" too many times, exiting

Trying to run the actual command that's failing.

[redacted-worker-node]:~# /usr/local/bin/k0s worker --kubelet-extra-args=--node-ip=10.0.2.1 --token-file=/etc/k0s/k0stoken
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet-client-current.pem: no such file or directory]

Where are these missing certificates supposed to be coming from?

FallingSnow avatar Mar 22 '22 20:03 FallingSnow

[redacted-worker-nodes]'s /var/log/k0s.err:

...
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
Error: failed to start kubelet config client: failed to load kubeconfig: invalid configuration: [unable to read client-cert /var/lib/k0s/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/k0s/kubelet/pki/kubelet
...

[redacted-worker-nodes]'s /var/log/k0s.log: k0s.log

FallingSnow avatar Mar 22 '22 21:03 FallingSnow

I don't think this is caused by k0sctl, @jnummelin @s0j @soider any ideas?

kke avatar Mar 23 '22 11:03 kke

I think the real issue is this:

failed to run Kubelet: mountpoint for cpu not found

I'd guess this is because cgroups is not enabled on the setup. So you need to enable it with something like:

sudo rc-update add cgroups boot
sudo rc-service cgroups start

Also make sure you have the corresponding kernel boot opts:

cgroup_enable=memory cgroup_enable=cpuset

The cert error is kinda expected on the early phase for kubelet as it has not yet been able to connect to the API with the given bootstrap config (the token file essentially).

jnummelin avatar Mar 23 '22 11:03 jnummelin

Is this cgroup thing something k0sctl should detect and maybe configure it or fail?

kke avatar Mar 23 '22 11:03 kke

Is this cgroup thing something k0sctl should detect and maybe configure it or fail?

k0s will soon do exactly that in its pre-flight checks, and refuse to start. But maybe that's not the best UX if k0s gets deployed via k0sctl.

twz123 avatar Mar 23 '22 13:03 twz123