ansible-role-rke2 icon indicating copy to clipboard operation
ansible-role-rke2 copied to clipboard

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes

Open jakuzure opened this issue 2 years ago • 0 comments

Summary

I upgraded rke2 from v1.22.9 to v1.23.9 which actually worked fine, but I noticed that some worker nodes were upgraded in between the master nodes which goes against RKE2 recommendations:

Note: Upgrade the server nodes first, one at a time. Once all servers have been upgraded, you may then upgrade agent nodes.

see https://docs.rke2.io/upgrade/basic_upgrade/

Ansible Output:

TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-0] *** skipping: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-0] *** changed: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* FAILED - RETRYING: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left). ok: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-0] *** skipping: [platform-rancher-master-k8s-master-0] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-1] *** skipping: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-1] *** changed: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* ok: [platform-rancher-master-k8s-master-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-1] *** skipping: [platform-rancher-master-k8s-master-1] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-1] *** skipping: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-1] *** changed: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* FAILED - RETRYING: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left). ok: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-worker-1] *** skipping: [platform-rancher-master-k8s-worker-1] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-2] *** skipping: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-2] *** changed: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Wait for all nodes to be ready again] ********************* ok: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-2] *** skipping: [platform-rancher-master-k8s-master-2] TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-0] *** skipping: [platform-rancher-master-k8s-worker-0] TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-0] ***

Issue Type

Bug Report

Ansible Version

ansible [core 2.12.7]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.10/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.5 (main, Jul 13 2022, 05:45:22) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True

Steps to Reproduce

trigger a RKE2 upgrade, i.e. from 1.22.9 to 1.23.9

Expected Results

Master nodes should be upgraded first, then the worker nodes

Actual Results

Nodes are upgraded seemingly randomly

jakuzure avatar Aug 30 '22 08:08 jakuzure