scylla-ccm icon indicating copy to clipboard operation
scylla-ccm copied to clipboard

fix(node decomission): stop a decommissioned node by default

Open yarongilor opened this issue 1 year ago • 3 comments

A decommissioned node currently left is_running() after its status changed to "decommissioned". thus it is unexpectedly left as part of cluster nodes, like in cluster.nodelist(). This should not be the default, but rather to stop() the node after decommission successfully completed (as it is in SCT). This fix requires a corresponding dtest PR to adjust all calls of node.decommission() in order to remove 14 un-needed calls to node.stop() or to set the new flag of stop_node=False needed for raft testing. This fix is a followup of: https://github.com/scylladb/scylla-dtest/pull/4767#discussion_r1731340260 refs: https://github.com/scylladb/scylla-dtest/pull/4767#discussion_r1731340260

There are also 11 occurrences of nodetool("decommission") to be adjusted/optimized:

$ grep -Eri '\.nodetool\(.*decommission' --include \*.py .
./manager_backup_tests.py:        node3.nodetool("decommission")
./compaction_additional_test.py:                node2.nodetool("decommission")
./manager_restore_tests.py:        node3.nodetool("decommission")
./manager_restore_tests.py:        node3.nodetool("decommission")
./manager_restore_tests.py:            node.nodetool("decommission")
./update_cluster_layout_tests.py:        node1.nodetool("decommission")
./update_cluster_layout_tests.py:        node3.nodetool("decommission", capture_output=False, wait=False)
./update_cluster_layout_tests.py:        node3.nodetool("decommission", capture_output=False, wait=False)
./update_cluster_layout_tests.py:        node3.nodetool("decommission", capture_output=False, wait=False)
./nodetool_additional_test.py:        node2.nodetool("decommission")
./topology_test.py:            out, err = node.nodetool("decommission")

yarongilor avatar Aug 27 '24 09:08 yarongilor

@yarongilor @pehala is this one still needed ?

fruch avatar Mar 10 '25 17:03 fruch

@yarongilor @pehala is this one still needed ?

i guess it is still relevant. We just have to prioritize and schedule it along with a dtest PR.

yarongilor avatar Mar 11 '25 07:03 yarongilor

i guess it is still relevant. We just have to prioritize and schedule it along with a dtest PR.

IF you want to progress with this please create dtest PR for this as well

pehala avatar Mar 11 '25 07:03 pehala