ruby-kafka icon indicating copy to clipboard operation
ruby-kafka copied to clipboard

OutOfOrderSequenceNumberError for async idempotent producer

Open paneq opened this issue 2 years ago • 0 comments

  • Version of Ruby: unrelated
  • Version of Kafka: unrelated
  • Version of ruby-kafka: unknown?
Steps to reproduce

I don't have the steps to reproduce. I tried in https://github.com/zendesk/ruby-kafka/pull/955 but I believe I failed to achieve the result. Nevertheless, let me share my hypothesis and see if we agree this would be a problem, despite my impossibility right now to reproduce.

Imagine you use an idempotent async producer (not transactional) and you publish a message without key, partition_key or partition provided explicitly. Imagine the message got properly delivered but we have not received a confirmation for whatever reasons.

As far as I understand from reading the code:

I think the assigned partition is not preserved between retries.

So I believe a following scenario can happen:

  • Send message A to partition 0, successful but not ACKed due to a network glitch
  • Retry sending message A but to partition 1, successful and ACKed
  • Send message B to partition 0, unsuccessful because Kafka cluster expects message A from this client in the sequence for this partition.

This I believe can lead to Kafka::OutOfOrderSequenceNumberError (explanation in java lib) which can subsequently keep occurring while the async producer tries to communicate with the broker. Which can in turn lead to buffer getting full and getting Kafka::BufferOverflow every time you try to produce a message.

Expected outcome

Assuming my assumptions above are correct, I see a couple of options:

  • Do not allow messages with indeterministic partitioning to be scheduled to async producer when idempotency is enabled
  • or preserve the partition allocation between retries

The end goal is to not receive OutOfOrderSequenceNumberError exceptions.

Actual outcome

Exceptions as described.

paneq avatar Aug 04 '22 12:08 paneq