ruby-kafka
ruby-kafka copied to clipboard
OutOfOrderSequenceNumberError for async idempotent producer
- Version of Ruby: unrelated
- Version of Kafka: unrelated
- Version of ruby-kafka: unknown?
Steps to reproduce
I don't have the steps to reproduce. I tried in https://github.com/zendesk/ruby-kafka/pull/955 but I believe I failed to achieve the result. Nevertheless, let me share my hypothesis and see if we agree this would be a problem, despite my impossibility right now to reproduce.
Imagine you use an idempotent async producer (not transactional) and you publish a message without key, partition_key or partition provided explicitly. Imagine the message got properly delivered but we have not received a confirmation for whatever reasons.
As far as I understand from reading the code:
- the delivery of this message will be retried
- the message will get assigned a random partition
I think the assigned partition is not preserved between retries.
So I believe a following scenario can happen:
- Send message A to partition 0, successful but not ACKed due to a network glitch
- Retry sending message A but to partition 1, successful and ACKed
- Send message B to partition 0, unsuccessful because Kafka cluster expects message A from this client in the sequence for this partition.
This I believe can lead to Kafka::OutOfOrderSequenceNumberError
(explanation in java lib) which can subsequently keep occurring while the async producer tries to communicate with the broker. Which can in turn lead to buffer getting full and getting Kafka::BufferOverflow
every time you try to produce a message.
Expected outcome
Assuming my assumptions above are correct, I see a couple of options:
- Do not allow messages with indeterministic partitioning to be scheduled to async producer when idempotency is enabled
- or preserve the partition allocation between retries
The end goal is to not receive OutOfOrderSequenceNumberError
exceptions.
Actual outcome
Exceptions as described.