rabbitmq-server
rabbitmq-server copied to clipboard
A way to configure durability of the queues used by exchange federation
We are investigating the use of federation of an exchange shared throughout an upstream rabbitmq cluster. We observe that when we terminate the RabbitMQ instance that represents the location of the durable queue bound to the federated exchange, the downstream node cannot continue to federate even after it attempts to connect to a different URI in the upstream list. The logs contain messages such as:
Federation exchange '(redacted)' in vhost '(redacted)' did not connect to exchange '(redacted)' in vhost '(redacted)' on amqp://(redacted):7777
{{shutdown,{server_initiated_close,404,
<<"NOT_FOUND - home node '(redacted)' of durable queue 'federation: (redacted)' in vhost '(redacted)' is down or inaccessible">>}},
From the documentation I gather that federation is implemented by the creation of a queue bound to the exchange, and messages are copied to that queue so they can be sent downstream. Documentation makes it clear that this queue resides on exactly one node, by default, so what we are seeing is understandable.
We would like to configure federation such that the queue is not durable, and the behavior is that when the node hosting the queue terminates (or downstream disconnects), the queue is dynamically re-created and re-bound to the exchange on whatever node the downstream connects to at a later time. In essence, on-demand federation where queues are only bound to upstream exchanges if there is a downstream listening.
As you can likely tell, we have no need for message durability in the face of intermittent connectivity. We just want downstream message delivery to resume when the downstream selects another node in the URI list.
+1
@elalonde thank you for reporting. We are able to simulate the same situation.
We could modify upstream
configuration, adding an parameter to configure the durable queue value.
By default is durable = true
.
Also note that it's possible to set a TTL on queues used by federation at least in some cases.
What would the immediate impact of making the value of upstream configurable, and setting it to false? Would this result in a situation where, after a disconnect, should downstream attempt to reconnect, the queue will be dynamically re-created and re-bound to the exchange on whatever node the downstream connects to at a later time? (this is our desired goal, as mentioned above)
@elalonde links declare the queues/exchanges/bindings they need upon initialization, so yes.
OK, that sounds great. Just to clarify one more thing:
We have a bunch of nodes listed in the URI set. So we would like the downstream node to simply pick the next one in the set and resume federating to him. If the original node selected for federation in the URI set is unavailable, and the downstream attempts to reconnect, will it automatically try the next URI in the set? (Sorry, but since I created this issue some time ago the exact behavior of the downstream node in this situation escapes me.)
I think that is supposed to work the way you expect, so please give it a try.
I would be happy to test the implications of a configurable upstream configuration value, in the face of upstream node disconnect.
+1 We're running into this same issue. Unless I'm misunderstanding (a definite possibility), doesn't this make it impossible to have federation between clusters that's resilient to node failure? I.e. if my upstream cluster has nodeA and nodeB, and the federation queue for exchange E is set up on nodeA, and nodeA goes down, then I can't federated exchange E until nodeA comes back up?
We are also running into the same problem. Has there been any update to this issue?
FWIW We've configured our federation queues to be HA and that seems to work. So far we haven't observed any performance penalty and in testing it has the desired effect - one node in the cluster can go down and federation still works.
Here is a repo containing scripts that I used to experiment, and a little writeup in the README: https://github.com/dantswain/rabbitmq_ha_federation
+1 We are also experiencing this issue. Is there any new status?
@michaelklishin can we consider @dantswain 's solution to be correct?
Durability and mirroring are orthogonal things, even if sometimes they solve the same problem. Making the internal queues durable if the user opts in might be a good idea but could also lead to leaked queues since some parameters that are used to compute queue names are not always stable.
We met the same issue, is it possible to set the queue "federation: test_ex -> rabbit@rabbit4" as quorum queue or mirror queue, without any other issues, such as performance.