crmsh icon indicating copy to clipboard operation
crmsh copied to clipboard

Fix: ha-cluster-join fails to add nodes to a existing cluster after the version update (bsc#1201785)

Open nicholasyang2022 opened this issue 2 years ago • 4 comments

Mechanism

  • add a mechanism for updating cluster config after version update:
    • A sequence is written to ~hacluster/crmsh/upgrade_seq and crmsh checks it to determine whether upgrade is needed whenever it runs.
    • Upgrade is performed only when all nodes in cluster are running compatible version of crmsh. It is determined by comparing CURRENT_UPGRADE_SEQ in upgradeutil.py.
    • Upgrade needs user confirm and is only performed in interactive mode.
  • setup passwordless ssh authentication for user hacluster with that mechanism.

Use Cases

Upgrade is performed in the following scenarios:

  1. Running any crmsh command in interactive mode when crmsh on all nodes of a cluster have been upgraded to a same version.

A warning is printed in the following scenarios:

  1. Nodes of a cluster are running different version of crmsh.
  2. Upgraded is needed and crmsh is not running in interactive mode.

Upgrade will not work in the following scenarios:

  1. Joining a cluster: crmsh is running on the joining node, which is not a member of the cluster, so it cannot perform upgrade on existing nodes.

nicholasyang2022 avatar Sep 26 '22 06:09 nicholasyang2022

wait krig/parallax#15

nicholasyang2022 avatar Sep 27 '22 04:09 nicholasyang2022

Hi @nicholasyang2022, as parallax's request get accepted, I think you can remove remoteutil.py code and using parallax now?

liangxin1300 avatar Oct 09 '22 06:10 liangxin1300

Loop #742 here

liangxin1300 avatar Oct 09 '22 06:10 liangxin1300

I found that there are lots off differences in orininal_regression_test, please use no_reg=True option for get_stdout_stderr function, and please also rebase https://github.com/ClusterLabs/crmsh/pull/1036, there is a minor change for the regression test

liangxin1300 avatar Oct 13 '22 15:10 liangxin1300

Updates:

hacluster's ssh key pair will be generated automatically now when a new node joins a cluster.

nicholasyang2022 avatar Nov 21 '22 05:11 nicholasyang2022

Good, now the join node can join the running cluster successfully, without failures

15sp4-3:~ # crm cluster join -c 15sp4-1 -y
WARNING: chronyd.service is not configured to start at system boot.
INFO: SSH key for root does not exist, hence generate it now
INFO: Configuring SSH passwordless with root@15sp4-1
Password: 
INFO: SSH key for hacluster does not exist, hence generate it now
INFO: Configuring SSH passwordless with hacluster@15sp4-1
INFO: Configuring SSH passwordless with root@15sp4-2
Password: 
INFO: Configuring SSH passwordless with hacluster@15sp4-2

My steps:

  1. node1 and node2 setup a cluster
  2. remove hacluster's .ssh directory on both node1 and node2
  3. node1, node2 and node3 update the new codes
  4. node3 joining, then I found .ssh for hacluster already back on node1 and node2

liangxin1300 avatar Nov 21 '22 09:11 liangxin1300

And if just update codes for existing nodes, steps:

  1. node1 and node2 setup cluster
  2. remove hacluster's .ssh directory on both node1 and node2
  3. node1 and node2 update the codes
  4. run any crm command on any nodes(node1 or node2), the process like:
15sp4-1:~ # crm
Upgrade of crmsh configuration: Setup passwordless ssh authentication for user hacluster (y/n)? y
INFO: Upgrade of crmsh configuration succeeded.
  1. then .ssh on node1 and node2 already back

liangxin1300 avatar Nov 21 '22 09:11 liangxin1300