k3s-ansible icon indicating copy to clipboard operation
k3s-ansible copied to clipboard

Playbook hangs on [Enable and check K3s service]

Open nwber opened this issue 4 years ago • 11 comments

I'm running the playbook with ansible-playbook site.yml -i inventory/sample/hosts.ini -k -K -vv.

It runs successfully up to [Enable and check K3s service] in /node/tasks/main.yml, then it hangs indefinitely. I've run this with varying levels of verbosity and debugging on.

Running this on Raspberry Pi 4Bs, all of which have Raspberry Pi OS Lite.

ansible-checks-k3s-hangs

hosts.ini

hosts

nwber avatar Jul 13 '20 02:07 nwber

I had the same problem.

For me the problem was the Firewall. Port 6443 was blocked on the master node.

EDIT:

A way to debug is to SSH into you Raspberry and execute the ExecStart command from /etc/systemd/system/k3s-node.service manually.

b-m-f avatar Jul 20 '20 15:07 b-m-f

I had the same problem. For me, the master IP wasn't set properly. Anything that stops the k3s-node service from starting will cause that.

I was able to see the error by running systemctl status k3s-node on the node itself.

JohnTheNerd avatar Jul 21 '20 23:07 JohnTheNerd

thanks for the info, as I had the same problem

I assume JohnTheNerd is referring to a feature of when using a hostname for the control node, the installation fails to append that info with its IPaddr to the nodes' /etc/host file

I was using a hostname as well in the [master] section and it failed. Switched to an explicit IPaddr and it installed fine and came up.

alternate solution is to use ansible to spray out an update to the nodes' /etc/hosts file.

update: I put k3s on a pi cluster using explicit IPaddrs and it works as above.
Then attempted an install on a lightsail aws cluster but due to public IPaddr to primary from my work machine and internal IPaddr within the cluster it failed.
I finally went back to using hostnames for the install by appending the cluster internal IP assignments to all nodes' and the master's /etc/hosts. Then I set the public IPaddr hostname on my work machine's /etc/hosts

thus the cluster machines had the internal IP and my machine had the public IP sweet

dougbertwashere avatar Aug 27 '20 21:08 dougbertwashere

Thanks for the pointers in people that figured this out for their distros. Was able to resolve on Centos 7 by doing: firewall-cmd --zone=public --add-port=6443/tcp

brobare avatar Apr 08 '21 06:04 brobare

I had the same problem. I changed the inventory for the IPaddress instead of the names i set in my .ssh/config file. that solved it!

bVdCreations avatar Jun 12 '21 18:06 bVdCreations

I changed inventory/hosts.ini to point to IP addresses instead of configured hostnames and the install worked. hostnames are listed when I do kubectl get nodes:

NAME              STATUS   ROLES    AGE     VERSION
control-plane-0   Ready    master   3h18m   v1.17.5+k3s1
compute-node-1    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-2    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-0    Ready    <none>   4m28s   v1.17.5+k3s1
compute-node-3    Ready    <none>   4m28s   v1.17.5+k3s1

aireilly avatar Sep 23 '21 16:09 aireilly

Mine is also failing on that same spot. I'm using Ubuntu Server 21.10 arm64 with k3s v1.22.5-k3s2.

  • https://gist.github.com/FilBot3/e8709ef7809527075bb5dc1df7288782

Logs from my nodes. I have 4 total, 1 master 3 workers. Each freshly imaged with /etc/hosts/ set to have FQDNs to their IP's. Then I bootstrapped with k3s-ansible latest master. I then reset and disabled the firewall with sudo ufw disable, rebooted, and tried again. Just hangs.

Obviously the k3s.service on the master node is failing to start, which is preventing the k3s-node.service workers from connecting.

FilBot3 avatar Oct 29 '21 03:10 FilBot3

I had this same problem, changing the hostnames from "raspberrypi" to something unique seemed to make it work.

jbeere avatar Dec 05 '21 17:12 jbeere

I had the same issue. I'm trying to deploy to a series av VPSs and it seems like the playbook is trying to use the private IP of the master node when launching the worker nodes.

orzen avatar Dec 10 '21 08:12 orzen

Many thanks @b-m-f I also had the firewall / port problem identified above and the following fixed it for me on Pi4s running Raspberry Pi OS 64-bit Bullseye Lite with ufw:

sudo ufw allow proto tcp from 192.168.1.0/24 to any port 6443

(obviously adjust the from details to whatever you need for your network)

nmstoker avatar Jan 15 '22 19:01 nmstoker

I had the same issue also using Raspberry Pi OS 64-bit Bullseye Lite. Not until I was running with both IPs instead of hostnames and followed @nmstoker suggestion did it work. Many thanks!

JohanNicander avatar Mar 31 '22 15:03 JohanNicander

Just in case someone comes across this issue running Ubuntu version > 20.04: There's an issue on the k3s project regarding kernel modules: https://github.com/k3s-io/k3s/issues/4234#issuecomment-947954002

Installing the necessary kernel modules helped me, the playbook ran successfully without hanging

Lechindianer avatar Nov 13 '22 20:11 Lechindianer

@Lechindianer You absolute champion, that was my issue! TY so much!

Zedifuu avatar Jan 22 '23 20:01 Zedifuu

the task Enable and check K3s service is to restart / enable k3s-node service in nodes

- name: Enable and check K3s service
  systemd:
    name: k3s-node
    daemon_reload: yes
    state: restarted
    enabled: yes

the problem is related to the setting here:

in file roles/k3s/node/templates/k3s.service.j2

ExecStart=/usr/local/bin/k3s agent --server https://{{ master_ip }}:6443 --token {{ hostvars[groups['master'][0]]['token'] }} {{ extra_agent_args | default("") }}

But the variable master_ip is not always master's IP, it is from inventory/my-cluster/hosts.ini. If you put IP in hosts.ini, then you get IP address, if you put hostname in it, you get hostname

inventory/my-cluster/group_vars/all.yml:master_ip: "{{ hostvars[groups['master'][0]]['ansible_host'] | default(groups['master'][0]) }}"

But in my case, I set the hostname, not IP in the hosts.ini, as poster, like this:

[master]
kubernetes-master

So when check in node, I got

ExecStart=/usr/local/bin/k3s agent --server https://kubernetes-master:6443 --token xxx

But there is no /etc/hosts to resolve its IP address, so the playbook is wait to join to the master, never finish (Active: activating (start))

$ sudo systemctl status k3s-node
● k3s-node.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: enabled)
   Active: activating (start) since Sat 2023-03-18 08:11:51 UTC; 29s ago
     Docs: https://k3s.io

The way to fix it is, if you use hostname in hosts.ini, you need add ansible_host

for example

[master]
kubernetes-master ansible_host=192.168.xxx.xxx

ozbillwang avatar Mar 18 '23 08:03 ozbillwang

Thanks @ozbillwang for the detailed info, I however fixed this by slightly modifying your suggestion.

In the hosts.ini file I added a new variable

[master]
HOST_NAME ansible_host_ip=192.168.xxx.xxx

then in the /group_vars/all.yml file, I changed how the variable master_ip is being resolved by replacing ansible_host with ansible_host_ip which is already set above.

OladapoAjala avatar May 03 '23 11:05 OladapoAjala

Closing this as discussion seems to have ended. It is recommended that

  1. All nodes have static IPs OR an external load balancer is configured with a fixed registration address. See https://docs.k3s.io/datastore/ha#4-optional-configure-a-fixed-registration-address and https://docs.k3s.io/datastore/cluster-loadbalancer
  2. Firewalls be disabled. I have another issue open to track attempting to open the firewall ports. https://github.com/k3s-io/k3s-ansible/issues/234

dereknola avatar Nov 09 '23 21:11 dereknola