rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

A way to configure durability of the queues used by exchange federation

Open elalonde opened this issue 9 years ago • 15 comments

We are investigating the use of federation of an exchange shared throughout an upstream rabbitmq cluster. We observe that when we terminate the RabbitMQ instance that represents the location of the durable queue bound to the federated exchange, the downstream node cannot continue to federate even after it attempts to connect to a different URI in the upstream list. The logs contain messages such as:

Federation exchange '(redacted)' in vhost '(redacted)' did not connect to exchange '(redacted)' in vhost '(redacted)' on amqp://(redacted):7777
{{shutdown,{server_initiated_close,404,
     <<"NOT_FOUND - home node '(redacted)' of durable queue 'federation: (redacted)' in vhost '(redacted)' is down or inaccessible">>}},

From the documentation I gather that federation is implemented by the creation of a queue bound to the exchange, and messages are copied to that queue so they can be sent downstream. Documentation makes it clear that this queue resides on exactly one node, by default, so what we are seeing is understandable.

We would like to configure federation such that the queue is not durable, and the behavior is that when the node hosting the queue terminates (or downstream disconnects), the queue is dynamically re-created and re-bound to the exchange on whatever node the downstream connects to at a later time. In essence, on-demand federation where queues are only bound to upstream exchanges if there is a downstream listening.

As you can likely tell, we have no need for message durability in the face of intermittent connectivity. We just want downstream message delivery to resume when the downstream selects another node in the URI list.

elalonde avatar Jun 11 '15 23:06 elalonde

+1

michaelplaing avatar Jun 12 '15 10:06 michaelplaing

@elalonde thank you for reporting. We are able to simulate the same situation.

We could modify upstream configuration, adding an parameter to configure the durable queue value.

By default is durable = true.

Gsantomaggio avatar Apr 28 '16 10:04 Gsantomaggio

Also note that it's possible to set a TTL on queues used by federation at least in some cases.

michaelklishin avatar Apr 28 '16 14:04 michaelklishin

What would the immediate impact of making the value of upstream configurable, and setting it to false? Would this result in a situation where, after a disconnect, should downstream attempt to reconnect, the queue will be dynamically re-created and re-bound to the exchange on whatever node the downstream connects to at a later time? (this is our desired goal, as mentioned above)

elalonde avatar Apr 28 '16 20:04 elalonde

@elalonde links declare the queues/exchanges/bindings they need upon initialization, so yes.

michaelklishin avatar Apr 28 '16 20:04 michaelklishin

OK, that sounds great. Just to clarify one more thing:

We have a bunch of nodes listed in the URI set. So we would like the downstream node to simply pick the next one in the set and resume federating to him. If the original node selected for federation in the URI set is unavailable, and the downstream attempts to reconnect, will it automatically try the next URI in the set? (Sorry, but since I created this issue some time ago the exact behavior of the downstream node in this situation escapes me.)

elalonde avatar Apr 28 '16 20:04 elalonde

I think that is supposed to work the way you expect, so please give it a try.

michaelklishin avatar Apr 28 '16 20:04 michaelklishin

I would be happy to test the implications of a configurable upstream configuration value, in the face of upstream node disconnect.

elalonde avatar Apr 28 '16 20:04 elalonde

+1 We're running into this same issue. Unless I'm misunderstanding (a definite possibility), doesn't this make it impossible to have federation between clusters that's resilient to node failure? I.e. if my upstream cluster has nodeA and nodeB, and the federation queue for exchange E is set up on nodeA, and nodeA goes down, then I can't federated exchange E until nodeA comes back up?

dantswain avatar Jun 16 '16 21:06 dantswain

We are also running into the same problem. Has there been any update to this issue?

tuukkala avatar Oct 14 '16 08:10 tuukkala

FWIW We've configured our federation queues to be HA and that seems to work. So far we haven't observed any performance penalty and in testing it has the desired effect - one node in the cluster can go down and federation still works.

Here is a repo containing scripts that I used to experiment, and a little writeup in the README: https://github.com/dantswain/rabbitmq_ha_federation

dantswain avatar Oct 14 '16 12:10 dantswain

+1 We are also experiencing this issue. Is there any new status?

ghost avatar Feb 07 '19 16:02 ghost

@michaelklishin can we consider @dantswain 's solution to be correct?

lukebakken avatar Apr 24 '19 21:04 lukebakken

Durability and mirroring are orthogonal things, even if sometimes they solve the same problem. Making the internal queues durable if the user opts in might be a good idea but could also lead to leaked queues since some parameters that are used to compute queue names are not always stable.

michaelklishin avatar Apr 24 '19 22:04 michaelklishin

We met the same issue, is it possible to set the queue "federation: test_ex -> rabbit@rabbit4" as quorum queue or mirror queue, without any other issues, such as performance.

fqyyang avatar Oct 26 '21 22:10 fqyyang