ansible-role-rke2 icon indicating copy to clipboard operation
ansible-role-rke2 copied to clipboard

Playbook stuck while starting the RKE2 service on agents

Open leon-andria opened this issue 2 years ago • 1 comments

Summary

In the troubleshooting section here: https://github.com/lablabs/ansible-role-rke2#troubleshooting, it mentions that it might be a network limitation.

The problem is that the RKE2 script is never executed on the agent which has condition with the variable installed_rke2_version. While that variable is depends on condition "rke2-server.service" in ansible_facts.services.

Below is the changes I made to fix the issue:

Before the Run AirGap RKE2 scripttask (https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L91), I added the following tasks by checking that the rke2 binary path exists and don't relying on this line https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L89.

- name: Check rke2 bin exists
  ansible.builtin.stat:
    path: "{{ rke2_bin_path }}"
  register: rke2_exists

- name: Check RKE2 version
  ansible.builtin.shell: |
    set -o pipefail
    {{ rke2_bin_path }} --version | grep -E "rke2 version" | awk '{print $3}'
  args:
    executable: /bin/bash
  changed_when: false
  register: installed_rke2_version
  when: rke2_exists.stat.exists

Issue Type

Bug Report

Ansible Version

ansible [core 2.14.2]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

Steps to Reproduce

- name: Deploy RKE2
  hosts: all
  become: yes
  vars:
    rke2_version: v1.26.0+rke2r2    
    rke2_api_ip : 192.168.1.10
    rke2_download_kubeconf: true    
    rke2_server_node_taints:
      - 'CriticalAddonsOnly=true:NoExecute'
    rke2_cni:
      - cilium
  roles:
     - role: lablabs.rke2
[masters]
master-01 ansible_host=192.168.1.10 rke2_type=server
master-02 ansible_host=192.168.1.11 rke2_type=server
master-03 ansible_host=192.168.1.12 rke2_type=server

[workers]
worker-01 ansible_host=192.168.1.20 rke2_type=agent
worker-02 ansible_host=192.168.1.21 rke2_type=agent

[k8s_cluster:children]
masters
workers

Expected Results

Worker nodes should be provisioned if the rke2.sh script have been executed on the following task https://github.com/lablabs/ansible-role-rke2/blob/dc6d4267dd346bb133baf662532bb797e0408270/tasks/rke2.yml#L100

Actual Results

It's just hanging until timeout.

leon-andria avatar Feb 24 '23 21:02 leon-andria

I tried your changes as described but they didn't work for me (airgapped, 3 workers, HA mode). I still had to run the RKE2 agent script by hand (on the workers). The install scripts runs correctly and the binaries exist but the execution of the agent service didnt (or its creation)

janonym1 avatar Jun 26 '23 08:06 janonym1