nats-streaming-server icon indicating copy to clipboard operation
nats-streaming-server copied to clipboard

Message Delivery - the Leader skips the messages

Open Xswer opened this issue 3 years ago • 3 comments

We are experiencing a strange NATS behaviour:

  1. Publisher-Client publishes a message to NATS without any errors
  2. Consumer-Client can not receive the message, as if the message was never sent. Consumer uses durable queue group subscription.
  3. Further and previous messages are published and consumed as usual.

What helps:

  • in some cases the problem could be solved via creating a new durable queue group with 'deliverAllAvailable'. In that case these "skipped" messages appear for the consumers in a new group. Sometimes that does not help.
  • if the first point does not help, then restart of the leader/leader change solves the problem - after leader restart we could create a new durable queue group with 'deliverAllAvailable' and only then the "skipped" messages could be received by the consumers.

The problem consistently appears on at least 2 channels. We assume, that the problem could be caused by corrupt raft logs. Could that be the cause?

Why we came to such assumption - we had a false configured redeploy pipeline, that allowed blue/green deployment. That means, that two instances could reference the same file store for a short period of time.

Could you please tell, whether simultaneous access to the raft log of 2 instances could cause raft log corruption? Could corrupted Raft log cause the described behaviour, when the newly published messages are not delivered to clients?

Thank you in advance!

Xswer avatar Jun 02 '21 06:06 Xswer