Kamil Braun
Kamil Braun
https://github.com/scylladb/scylladb/issues/17903#issuecomment-2060669590 similar situation here: https://github.com/scylladb/scylladb/issues/17786#issuecomment-2059213151 decommission racing with coordinated write. The write uses "wrong" replica set (in that case there was a crash)
@bhalevy Reads to decommissioning node should be drained before we mark it as dead or purge it from the cluster. The correct decommissioning procedure is as follows: - start double...
There is -- through node_ops RPCs and ring_delay sleeps. It is not fault tolerant and it is not reliable, but in the happy case, when there are no network problems...
Another failure https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/8544/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Tests___dtest___test_simple_decommission_node_while_query_info_1_/ Uploaded logs: [1714998635438_update_cluster_layout_tests.py TestUpdateClusterLayout test_simple_decommission_node_while_query_info[1].zip](https://github.com/scylladb/scylladb/files/15232670/1714998635438_update_cluster_layout_tests.py.TestUpdateClusterLayout.test_simple_decommission_node_while_query_info.1.zip) [dtest-gw2.log](https://github.com/scylladb/scylladb/files/15232672/dtest-gw2.log)
Dequeued, breaks build: https://github.com/scylladb/scylladb/issues/17699
Please remember to run new tests 100 times, preferably in debug mode (cc @scylladb/scylla-maint )
Timeout for each ping is 300ms. We should indeed increase it, and preferably make it configurable, setting the default to 500ms or something.
I opened https://github.com/scylladb/scylladb/issues/16607
> Is this swapping between raft states actually harmful for my cluster? @Fornax96 if you're not seeing a log message like "gaining leadership" or "losing leadership" appearing periodically on different...
@Fornax96 hanging repairs could be https://github.com/scylladb/scylladb/issues/17591, this is fixed in 5.4.5 https://github.com/scylladb/scylladb/commit/e868ade25831aee68da66f742395a96124ab8123 so upgrade your cluster (latest is 5.4.6) and check if repairs keep hanging. The failure detector issue fix...