k3s-ansible
k3s-ansible copied to clipboard
Playbook hangs on [Enable and check K3s service]
I'm running the playbook with ansible-playbook site.yml -i inventory/sample/hosts.ini -k -K -vv
.
It runs successfully up to [Enable and check K3s service] in /node/tasks/main.yml, then it hangs indefinitely. I've run this with varying levels of verbosity and debugging on.
Running this on Raspberry Pi 4Bs, all of which have Raspberry Pi OS Lite.
hosts.ini
I had the same problem.
For me the problem was the Firewall. Port 6443 was blocked on the master node.
EDIT:
A way to debug is to SSH into you Raspberry and execute the ExecStart
command from /etc/systemd/system/k3s-node.service
manually.
I had the same problem. For me, the master IP wasn't set properly. Anything that stops the k3s-node service from starting will cause that.
I was able to see the error by running systemctl status k3s-node
on the node itself.
thanks for the info, as I had the same problem
I assume JohnTheNerd is referring to a feature of when using a hostname for the control node, the installation fails to append that info with its IPaddr to the nodes' /etc/host file
I was using a hostname as well in the [master] section and it failed. Switched to an explicit IPaddr and it installed fine and came up.
alternate solution is to use ansible to spray out an update to the nodes' /etc/hosts file.
update: I put k3s on a pi cluster using explicit IPaddrs and it works as above.
Then attempted an install on a lightsail aws cluster but due to public IPaddr to primary from my work machine and internal IPaddr within the cluster it failed.
I finally went back to using hostnames for the install by appending the cluster internal IP assignments to all nodes' and the master's /etc/hosts.
Then I set the public IPaddr hostname on my work machine's /etc/hosts
thus the cluster machines had the internal IP and my machine had the public IP sweet
Thanks for the pointers in people that figured this out for their distros. Was able to resolve on Centos 7 by doing: firewall-cmd --zone=public --add-port=6443/tcp
I had the same problem. I changed the inventory for the IPaddress instead of the names i set in my .ssh/config file. that solved it!
I changed inventory/hosts.ini
to point to IP addresses instead of configured hostnames and the install worked. hostnames are listed when I do kubectl get nodes
:
NAME STATUS ROLES AGE VERSION
control-plane-0 Ready master 3h18m v1.17.5+k3s1
compute-node-1 Ready <none> 4m28s v1.17.5+k3s1
compute-node-2 Ready <none> 4m28s v1.17.5+k3s1
compute-node-0 Ready <none> 4m28s v1.17.5+k3s1
compute-node-3 Ready <none> 4m28s v1.17.5+k3s1
Mine is also failing on that same spot. I'm using Ubuntu Server 21.10 arm64 with k3s v1.22.5-k3s2.
- https://gist.github.com/FilBot3/e8709ef7809527075bb5dc1df7288782
Logs from my nodes. I have 4 total, 1 master 3 workers. Each freshly imaged with /etc/hosts/
set to have FQDNs to their IP's. Then I bootstrapped with k3s-ansible latest master. I then reset and disabled the firewall with sudo ufw disable
, rebooted, and tried again. Just hangs.
Obviously the k3s.service
on the master node is failing to start, which is preventing the k3s-node.service
workers from connecting.
I had this same problem, changing the hostnames from "raspberrypi" to something unique seemed to make it work.
I had the same issue. I'm trying to deploy to a series av VPSs and it seems like the playbook is trying to use the private IP of the master node when launching the worker nodes.
Many thanks @b-m-f I also had the firewall / port problem identified above and the following fixed it for me on Pi4s running Raspberry Pi OS 64-bit Bullseye Lite with ufw:
sudo ufw allow proto tcp from 192.168.1.0/24 to any port 6443
(obviously adjust the from
details to whatever you need for your network)
I had the same issue also using Raspberry Pi OS 64-bit Bullseye Lite. Not until I was running with both IPs instead of hostnames and followed @nmstoker suggestion did it work. Many thanks!
Just in case someone comes across this issue running Ubuntu version > 20.04: There's an issue on the k3s project regarding kernel modules: https://github.com/k3s-io/k3s/issues/4234#issuecomment-947954002
Installing the necessary kernel modules helped me, the playbook ran successfully without hanging
@Lechindianer You absolute champion, that was my issue! TY so much!
the task Enable and check K3s service
is to restart / enable k3s-node
service in nodes
- name: Enable and check K3s service
systemd:
name: k3s-node
daemon_reload: yes
state: restarted
enabled: yes
the problem is related to the setting here:
in file roles/k3s/node/templates/k3s.service.j2
ExecStart=/usr/local/bin/k3s agent --server https://{{ master_ip }}:6443 --token {{ hostvars[groups['master'][0]]['token'] }} {{ extra_agent_args | default("") }}
But the variable master_ip
is not always master's IP, it is from inventory/my-cluster/hosts.ini
. If you put IP in hosts.ini
, then you get IP address, if you put hostname in it, you get hostname
inventory/my-cluster/group_vars/all.yml:master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"
But in my case, I set the hostname, not IP in the hosts.ini
, as poster, like this:
[master]
kubernetes-master
So when check in node, I got
ExecStart=/usr/local/bin/k3s agent --server https://kubernetes-master:6443 --token xxx
But there is no /etc/hosts to resolve its IP address, so the playbook is wait to join to the master, never finish (Active: activating (start)
)
$ sudo systemctl status k3s-node
● k3s-node.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: enabled)
Active: activating (start) since Sat 2023-03-18 08:11:51 UTC; 29s ago
Docs: https://k3s.io
The way to fix it is, if you use hostname in hosts.ini
, you need add ansible_host
for example
[master]
kubernetes-master ansible_host=192.168.xxx.xxx
Thanks @ozbillwang for the detailed info, I however fixed this by slightly modifying your suggestion.
In the hosts.ini
file I added a new variable
[master]
HOST_NAME ansible_host_ip=192.168.xxx.xxx
then in the /group_vars/all.yml file, I changed how the variable master_ip
is being resolved by replacing ansible_host
with ansible_host_ip
which is already set above.
Closing this as discussion seems to have ended. It is recommended that
- All nodes have static IPs OR an external load balancer is configured with a fixed registration address. See https://docs.k3s.io/datastore/ha#4-optional-configure-a-fixed-registration-address and https://docs.k3s.io/datastore/cluster-loadbalancer
- Firewalls be disabled. I have another issue open to track attempting to open the firewall ports. https://github.com/k3s-io/k3s-ansible/issues/234