scylla-cluster-tests icon indicating copy to clipboard operation
scylla-cluster-tests copied to clipboard

Decommissioning a node should not fail while node is in state UL

Open cezarmoise opened this issue 10 months ago • 3 comments

When decommissioning a node, it looks like sct doesn't wait enough, or the checks for decommission are wrong.

[2025-01-24T07:40:53.662Z] < t:2025-01-24 07:40:53,192 f:cluster.py l:4956 c:sdcm.cluster p:ERROR > Decommission Node longevity-parallel-topology-schema--db-node-0b0e042d-6 [54.83.0.13 | 10.12.10.16] FAIL [2025-01-24T07:40:53.662Z] < t:2025-01-24 07:40:53,203 f:cluster.py l:4957 c:sdcm.cluster p:ERROR > Node that was decommissioned Node longevity-parallel-topology-schema--db-node-0b0e042d-6 [54.83.0.13 | 10.12.10.16] still in the cluster. Cluster status info: {'us-east': {'10.12.10.16': {'state': 'UL', 'load': '36.02GB', 'tokens': '256', 'owns': '?', 'host_id': 'c02808f5-a198-4db0-9095-5b7c6c87615d', 'rack': '1c'}, '10.12.8.163': {'state': 'UN', 'load': '22.11GB', 'tokens': '256', 'owns': '?', 'host_id': '60792997-a527-4d0f-9f27-82c73c75899d', 'rack': '1c'}, '10.12.9.117': {'state': 'UN', 'load': '30.55GB', 'tokens': '256', 'owns': '?', 'host_id': '32d3dce1-3026-47cf-ad11-6a2362480328', 'rack': '1c'}, '10.12.9.191': {'state': 'UN', 'load': '27.81GB', 'tokens': '256', 'owns': '?', 'host_id': '33627ba4-25f7-441e-8bf0-e6a5e7cbcbed', 'rack': '1c'}, '10.12.9.96': {'state': 'UN', 'load': '31.59GB', 'tokens': '256', 'owns': '?', 'host_id': 'fb23d549-a569-4b91-9c98-3c94c41cfe23', 'rack': '1c'}}}

IP Address State Load Tokens Owns Host ID Rack
10.12.10.16 UL 36.02GB 256 ? c02808f5-a198-4db0-9095-5b7c6c87615d 1c
10.12.8.163 UN 22.11GB 256 ? 60792997-a527-4d0f-9f27-82c73c75899d 1c
10.12.9.117 UN 30.55GB 256 ? 32d3dce1-3026-47cf-ad11-6a2362480328 1c
10.12.9.191 UN 27.81GB 256 ? 33627ba4-25f7-441e-8bf0-e6a5e7cbcbed 1c
10.12.9.96 UN 31.59GB 256 ? fb23d549-a569-4b91-9c98-3c94c41cfe23 1c

https://argus.scylladb.com/tests/scylla-cluster-tests/0b0e042d-60a7-4dad-832d-4e38f2e5a5e9

cezarmoise avatar Jan 27 '25 13:01 cezarmoise

@cezarmoise are you sure it's not a Scylla issue? I don't recognize we changed waiting for decommission to complete recently - so likely something changed in Scylla.

soyacz avatar Jan 27 '25 14:01 soyacz

@cezarmoise are you sure it's not a Scylla issue? I don't recognize we changed waiting for decommission to complete recently - so likely something changed in Scylla.

The node does finish decommissioning a few minutes later.

cezarmoise avatar Jan 27 '25 16:01 cezarmoise

@cezarmoise please move to scylla core, I don't see how it's has anything todo with SCT

fruch avatar Mar 13 '25 11:03 fruch

@cezarmoise What the update from this? Is this still happening?

pehala avatar Aug 18 '25 08:08 pehala

Did not see it reproduced since.

cezarmoise avatar Aug 18 '25 08:08 cezarmoise