Kamil Braun comments

Results 302 comments of


                                            Kamil Braun

TestUpdateClusterLayout::test_simple_decommission_node_while_query_info failed

https://github.com/scylladb/scylladb/issues/17903#issuecomment-2060669590 similar situation here: https://github.com/scylladb/scylladb/issues/17786#issuecomment-2059213151 decommission racing with coordinated write. The write uses "wrong" replica set (in that case there was a crash)

TestUpdateClusterLayout::test_simple_decommission_node_while_query_info failed

@bhalevy Reads to decommissioning node should be drained before we mark it as dead or purge it from the cluster. The correct decommissioning procedure is as follows: - start double...

TestUpdateClusterLayout::test_simple_decommission_node_while_query_info failed

There is -- through node_ops RPCs and ring_delay sleeps. It is not fault tolerant and it is not reliable, but in the happy case, when there are no network problems...

TestUpdateClusterLayout::test_simple_decommission_node_while_query_info failed

Another failure https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/8544/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Tests___dtest___test_simple_decommission_node_while_query_info_1_/ Uploaded logs: [1714998635438_update_cluster_layout_tests.py TestUpdateClusterLayout test_simple_decommission_node_while_query_info[1].zip](https://github.com/scylladb/scylladb/files/15232670/1714998635438_update_cluster_layout_tests.py.TestUpdateClusterLayout.test_simple_decommission_node_while_query_info.1.zip) [dtest-gw2.log](https://github.com/scylladb/scylladb/files/15232672/dtest-gw2.log)

storage_service/ownership: handle requests when tablets are enabled

Dequeued, breaks build: https://github.com/scylladb/scylladb/issues/17699

storage_service/ownership: handle requests when tablets are enabled

Please remember to run new tests 100 times, preferably in debug mode (cc @scylladb/scylla-maint )

Raft servers constantly switching between dead and alive (in high latency and/or % of packet loss between nodes topology)

Timeout for each ping is 300ms. We should indeed increase it, and preferably make it configurable, setting the default to 500ms or something.

Raft servers constantly switching between dead and alive (in high latency and/or % of packet loss between nodes topology)

I opened https://github.com/scylladb/scylladb/issues/16607

Raft servers constantly switching between dead and alive (in high latency and/or % of packet loss between nodes topology)

> Is this swapping between raft states actually harmful for my cluster? @Fornax96 if you're not seeing a log message like "gaining leadership" or "losing leadership" appearing periodically on different...

Raft servers constantly switching between dead and alive (in high latency and/or % of packet loss between nodes topology)

@Fornax96 hanging repairs could be https://github.com/scylladb/scylladb/issues/17591, this is fixed in 5.4.5 https://github.com/scylladb/scylladb/commit/e868ade25831aee68da66f742395a96124ab8123 so upgrade your cluster (latest is 5.4.6) and check if repairs keep hanging. The failure detector issue fix...