ansible-role-rke2
ansible-role-rke2 copied to clipboard
bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes
Summary
I upgraded rke2 from v1.22.9 to v1.23.9 which actually worked fine, but I noticed that some worker nodes were upgraded in between the master nodes which goes against RKE2 recommendations:
Note: Upgrade the server nodes first, one at a time. Once all servers have been upgraded, you may then upgrade agent nodes.
see https://docs.rke2.io/upgrade/basic_upgrade/
Ansible Output:
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-0] *** skipping: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-0] *** changed: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* FAILED - RETRYING: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left). ok: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-0] *** skipping: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-1] *** skipping: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-1] *** changed: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* ok: [platform-rancher-master-k8s-master-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-1] *** skipping: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-1] *** skipping: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-1] *** changed: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* FAILED - RETRYING: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left). ok: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-worker-1] *** skipping: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-2] *** skipping: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-2] *** changed: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* ok: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-2] *** skipping: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-0] *** skipping: [platform-rancher-master-k8s-worker-0] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-0] ***
Issue Type
Bug Report
Ansible Version
ansible [core 2.12.7]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.10/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.5 (main, Jul 13 2022, 05:45:22) [GCC 10.2.1 20210110]
jinja version = 3.1.2
libyaml = True
Steps to Reproduce
trigger a RKE2 upgrade, i.e. from 1.22.9 to 1.23.9
Expected Results
Master nodes should be upgraded first, then the worker nodes
Actual Results
Nodes are upgraded seemingly randomly