rabbitmq-server
rabbitmq-server copied to clipboard
When a cluster node is removed, consider deleting all non-mirrored queues hosted on it
See this rabbitmq-users thread for background. If a node is removed from a cluster, all non-mirrored queues hosted on it should be removed or migrated (depending on their durability: durable queues are not migrated when their hosting node is unavailable but transient queues are).
This is somewhat of an edge case. There can be other edge cases with the proposed solution above.
Update from 2020-2021: with the introduction of maintenance mode and planned removal of classic queue mirroring in favor of quorum queues and streams, this becomes both less relevant and smaller in scope/less risky of a change.
+1. And users are most likely to be unaware of which queues are residing on the node being removed, in case any are critical, and the exercise of checking this could be tedious (as queue counts increase). Brainstorming a possibility of queue migration policies, putting onus on users to decide how migration is carried out, and for which queues in particular (durable/transient);
-
migrate-to-none
- simply delete all non-mirrored queues on removed node -
migrate-to-any
- move matching non-mirrored queues to any other cluster node -
migrate-to-node
- move matching non-mirrored queues to a specific cluster node -
migrate-to-min-master
- move matching non-mirrored queues to node with least queue masters
Durability definitely a big factor. Migration would also slow down duration of node removal procedure - but, it would/could be the price to pay, in order to retain any non-mirrored queues.
Will this be implemented soon?
Now that there is a way to transfer leadership of a queue to a different node, it is an option for, say, durable queues.
Now that there is a way to transfer leadership of a queue to a different node, it is an option for, say, durable queues.
@michaelklishin Which method are you referring to? Thanks
@luddd3 I was referring to an internal API introduced for maintenance mode.
You can put a node under maintenance before removing it, the node won't keep or accept any client connections and host any queue leader replicas. So the effect would be comparable to what this issue tries to achieve.
Update from 2020-2021: with the introduction of maintenance mode and planned removal of classic queue mirroring in favor of quorum queues and streams, this becomes both less relevant and smaller in scope/less risky of a change.
@michaelklishin I tried to reproduce the scenario in rabbitmq 3.8.22. When i removed then node from a cluster,all the queue were deleted. Is this issue has been solved? Below was my reproduce steps
- rabbit@rabbitmqservice-0 and rabbit@rabbitmqservice-1 was a cluster
- test,test2,test3,test4 ware four queues created in two nodes
test **durable** **rabbit@rabbitmqservice-0**
test2 transient rabbit@rabbitmqservice-0
test3 **durable** **rabbit@rabbitmqservice-1**
test4 transient rabbit@rabbitmqservice-1
- stop rabbitmqservice-1
[rabbitmq@rabbitmqservice-1 rabbitmq-service]$ rabbitmqctl stop_app
Stopping rabbit application on node rabbit@rabbitmqservice-1 ...
[rabbitmq@rabbitmqservice-1 rabbitmq-service]$
test3 queue was deleted, test4 queue was down
- remove rabbitmqservice-1 from cluster
[rabbitmq@rabbitmqservice-0 /]$ rabbitmqctl -n rabbit@rabbitmqservice-0 forget_cluster_node rabbit@rabbitmqservice-1
Removing node rabbit@rabbitmqservice-1 from the cluster
- only test,test2 queues were left.
[rabbitmq@rabbitmqservice-0 /]$ rabbitmqctl list_queues
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name messages
test 0
test2 0
@polaris-alioth we can't tell since you haven't shared any details on the properties of those queues. They might have been exclusive, for example. We do not guess in this community.
@michaelklishin I created four queues, all were non-mirrored. test3 was durable, test4 was transient test durable on rabbit@rabbitmqservice-0 test2 transient on rabbit@rabbitmqservice-0 test3 durable on rabbit@rabbitmqservice-1 test4 transient on rabbit@rabbitmqservice-1