ansible-role-rke2
ansible-role-rke2 copied to clipboard
Playbook stuck while starting the RKE2 service on agents
Summary
In the troubleshooting section here: https://github.com/lablabs/ansible-role-rke2#troubleshooting, it mentions that it might be a network limitation.
The problem is that the RKE2 script is never executed on the agent which has condition with the variable installed_rke2_version
. While that variable is depends on condition "rke2-server.service" in ansible_facts.services
.
Below is the changes I made to fix the issue:
Before the Run AirGap RKE2 script
task (https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L91), I added the following tasks by checking that the rke2 binary path exists and don't relying on this line https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L89.
- name: Check rke2 bin exists
ansible.builtin.stat:
path: "{{ rke2_bin_path }}"
register: rke2_exists
- name: Check RKE2 version
ansible.builtin.shell: |
set -o pipefail
{{ rke2_bin_path }} --version | grep -E "rke2 version" | awk '{print $3}'
args:
executable: /bin/bash
changed_when: false
register: installed_rke2_version
when: rke2_exists.stat.exists
Issue Type
Bug Report
Ansible Version
ansible [core 2.14.2]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (/usr/bin/python3)
jinja version = 3.0.3
libyaml = True
Steps to Reproduce
- name: Deploy RKE2
hosts: all
become: yes
vars:
rke2_version: v1.26.0+rke2r2
rke2_api_ip : 192.168.1.10
rke2_download_kubeconf: true
rke2_server_node_taints:
- 'CriticalAddonsOnly=true:NoExecute'
rke2_cni:
- cilium
roles:
- role: lablabs.rke2
[masters]
master-01 ansible_host=192.168.1.10 rke2_type=server
master-02 ansible_host=192.168.1.11 rke2_type=server
master-03 ansible_host=192.168.1.12 rke2_type=server
[workers]
worker-01 ansible_host=192.168.1.20 rke2_type=agent
worker-02 ansible_host=192.168.1.21 rke2_type=agent
[k8s_cluster:children]
masters
workers
Expected Results
Worker nodes should be provisioned if the rke2.sh script have been executed on the following task https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L100
Actual Results
It's just hanging until timeout.
I tried your changes as described but they didn't work for me (airgapped, 3 workers, HA mode). I still had to run the RKE2 agent script by hand (on the workers). The install scripts runs correctly and the binaries exist but the execution of the agent service didnt (or its creation)