librdkafka
librdkafka copied to clipboard
ListOffsets loop of failed requests on leader epoch change until timeout happens
Description
ListOffsets requests done for partitions with no committed offsets can be retried indefinitely if that partition leader epoch has changed, because the buffer is retried without recreating it with the new CurrentLeaderEpoch received from the Metadata refresh call.
How to reproduce
Start consuming partitions that have no committed offset, or seek to the latest offset. A partition leader change should happen that changes the current leader epoch to a value higher than the cached one. The ListOffsets request give a FENCED_LEADER_EPOCH and then it refreshes Metadata, but starts retrying the buffer with the same CurrentLeaderEpoch, leading to a loop of failed requests.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
- [x] librdkafka version (release number or git tag):
2.1.0+ - [x] Apache Kafka version:
any - [x] librdkafka client configuration:
any - [x] Operating system:
any - [ ] Provide logs (with
debug=..as necessary) from librdkafka - [ ] Provide broker log excerpts
- [ ] Critical issue