nats-streaming-server
nats-streaming-server copied to clipboard
Message Delivery - the Leader skips the messages
We are experiencing a strange NATS behaviour:
- Publisher-Client publishes a message to NATS without any errors
- Consumer-Client can not receive the message, as if the message was never sent. Consumer uses durable queue group subscription.
- Further and previous messages are published and consumed as usual.
What helps:
- in some cases the problem could be solved via creating a new durable queue group with 'deliverAllAvailable'. In that case these "skipped" messages appear for the consumers in a new group. Sometimes that does not help.
- if the first point does not help, then restart of the leader/leader change solves the problem - after leader restart we could create a new durable queue group with 'deliverAllAvailable' and only then the "skipped" messages could be received by the consumers.
The problem consistently appears on at least 2 channels. We assume, that the problem could be caused by corrupt raft logs. Could that be the cause?
Why we came to such assumption - we had a false configured redeploy pipeline, that allowed blue/green deployment. That means, that two instances could reference the same file store for a short period of time.
Could you please tell, whether simultaneous access to the raft log of 2 instances could cause raft log corruption? Could corrupted Raft log cause the described behaviour, when the newly published messages are not delivered to clients?
Thank you in advance!