rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

reject-publish ignored by Dead-Letter quorum queues

Open leadenmoth opened this issue 2 years ago • 6 comments

Describe the bug

RabbitMQ 3.11.11, Erlang 25.3, deployed in AKS via cluster-operator 2.2.0

Setting max-length: <int> and overflow: reject-publish policy on a quorum dead letter queue leads to queue continuously filling beyond max-length (I stopped the test at 25000 messages with max-length:10). According to the docs,

When a quorum queue reaches the max-length limit and reject-publish is configured it notifies each publishing channel who from thereon will reject all messages back to the client.

My understanding is that pushing a dead message from main queue to dead letter exchange and then to bound dead letter queue is done entirely on the server side?.. If so, my best guess is that whatever does that publishing (some internal publisher?..) ignores the reject-publish notification. Setting drop-head (well, unsetting reject-publish and letting it default to drop-head) instead works perfectly.

Reproduction steps

  1. Set up normal and DL exchanges, set up normal and DL queues of type quorum, set queue-exchange bindings, set x-dead-letter-exchange, x-dead-letter-routing-key and x-delivery-limit on normal queue
  2. Set up a max-length and reject-publish policy on the DL queue
  3. Publish significantly more messages than max-length to normal queue
  4. Consume and reject messages from normal queue continuously until x-delivery-limit is reached and the messages start moving to DL queue
  5. Observe that all messages are moved to DL queue, regardless of max-length ...

Expected behavior

Messages to-be-dead-lettered above the max-length limit should be discarded when reject-publish is set

Additional context

Context: we have a legacy system prone to causing message pile-ups and going above the high RAM watermark for good (less so after 3.8->3.11 upgrade, due to all the improvements to quorum behavior). Most queues are quorum and have DL counterparts (also quorum) with a single dead letter exchange and routing-key -> binding routing. Message pileups eventually end up in corresponding dead letter queues, still occupying RAM (or HDD, since the 3.11 upgrade). In an effort to free up resources I was trying to limit the size of DL queues. Since there's no process for dealing with DL messages at all (again, legacy system, someone decided having DL would be good, but never got around to using it), the knee-jerk reaction would be to just kill the DL queues. I thought I'd instead keep them for human review with just a few messages, but, crucially, the earliest messages that failed. Sure, failure could be logged elsewhere (and it is), but I thought having all the RabbitMQ fields available for investigation would be nice. My worry was that reject-publish on a DL queue would cause some weird requeueing loop. Instead, it's just ignored.

leadenmoth avatar Jun 07 '23 12:06 leadenmoth

At-least-once dead lettering will respect reject-publish of the dead letter target quorum queue. However, you'll need to take care to not cause message pileup in the source quorum queue. (In your case the messages already pile up in the source quorum queue.)

In an effort to free up resources I was trying to limit the size of DL queues.

Instead of dead lettering to a quorum queue, could you dead letter to a Stream (possible since https://github.com/rabbitmq/rabbitmq-server/pull/7846)?

ansd avatar Jun 07 '23 13:06 ansd

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish. In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Re: streams, I won't be able to justify the extra development work for our almost-non-existent dead lettering use case, but I'll look into it when we begin overhauling this part of the system. Basically, I'm the only one fighting to keep any dead lettering in the current implementation :P

leadenmoth avatar Jun 07 '23 13:06 leadenmoth

Dead lettering does not really follow most of the code paths related to publishing, so this can be one of the examples where it's a mere oversight.

michaelklishin avatar Jun 07 '23 14:06 michaelklishin

I added your feature request to https://github.com/rabbitmq/rabbitmq-server/issues/8261#issuecomment-1580886519

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish.

You're right, we should document that specific limitation.

In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Yes, good point 🙂 Although this still applies to dead letter target classic queues.

I'm just trying to help you with an immediate workaround. You could

  • Switch your target dead letter quorum queue overflow behaviour to drop-head, or
  • Use a classic queue as target dead letter queue with overflow behaviour either drop-head or reject-publish, or
  • Use a Stream as target dead letter queue (configuring x-max-age or x-max-length-bytes).

ansd avatar Jun 07 '23 14:06 ansd

Thanks @ansd - I was going to go with drop-head and leave it at that for the current increment. For our use case it's a nice-to-have, but since it seemed like a bug or a gap in docs, I decided to report it. Depending on how long it takes until we start redesigning the system, I might switch DL to classic and try again - those queues, the way we use them, don't need to be quorum, but it's also a sizeable piece of work to just recreate them as classic due to how our code is layered and per-environment change management happens, and 3.11 upgrade reduced the pain points so that it's harder to justify further changes :D

leadenmoth avatar Jun 07 '23 14:06 leadenmoth

Many thanks for reporting this issue @leadenmoth! To be honest, until you opened this issue, I wasn't aware that quorum queues behave like that.

ansd avatar Jun 07 '23 14:06 ansd