rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

CLI tools: check_if_node_is_quorum_critical: reduce response wait time from peers that are stopped, unreachable or down

Open michaelklishin opened this issue 2 years ago • 0 comments

Previously discussed in https://github.com/rabbitmq/rabbitmq-server/discussions/9522.

Currently rabbitmq-diagnostics check_if_node_is_quorum_critical does the following to find out if any of the queues or streams would lose their quorum if the current node is stopped:

  • List QQs with local replicas with minimum quorum
  • List streams with local replicas with minimum quorum
  • See if the list is blank or not

To find out if a QQ or stream has "minimum quorum" it contacts all running nodes, where the definition of "running" is that of rabbit_nodes:list_running/0, which contacts other nodes with a 10s timeout.

By using a local snapshot of cluster members (that is, without checking with other nodes to see if they are online/reachable), the effects of down nodes on CLI command return operation should be significantly reduced.

michaelklishin avatar Oct 20 '23 15:10 michaelklishin