rabbitmq-server
rabbitmq-server copied to clipboard
CLI tools: check_if_node_is_quorum_critical: reduce response wait time from peers that are stopped, unreachable or down
Previously discussed in https://github.com/rabbitmq/rabbitmq-server/discussions/9522.
Currently rabbitmq-diagnostics check_if_node_is_quorum_critical does the following to find out if any of the queues or streams would lose their quorum if the current node is stopped:
- List QQs with local replicas with minimum quorum
- List streams with local replicas with minimum quorum
- See if the list is blank or not
To find out if a QQ or stream has "minimum quorum" it contacts all running nodes, where the definition of "running" is that of rabbit_nodes:list_running/0, which contacts other nodes with a 10s timeout.
By using a local snapshot of cluster members (that is, without checking with other nodes to see if they are online/reachable), the effects of down nodes on CLI command return operation should be significantly reduced.