kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-16249; Improve reconciliation state machine

Open dajac opened this issue 1 year ago • 2 comments

This patch re-work the reconciliation state machine on the server side with the goal to fix a few issues that we have recently discovered.

  • When a member acknowledges the revocation of partitions (by not reporting them in the heartbeat), the current implementation may miss it. The issue is that the current implementation re-compute the assignment of a member whenever there is a new target assignment installed. When it happens, it does not consider the reported owned partitions at all. As the member is supposed to only report its own partitions when they change, the member is stuck.
  • Similarly, as the current assignment is re-computed whenever there is a new target assignment, the rebalance timeout, as it is currently implemented, becomes useless. The issue is that the rebalance timeout is reset whenever the member enters the revocation state. In other words, in the current implementation, the timer is reset when there are no target available even if the previous revocation is not completed yet.

The patch fixes these two issues by not automatically recomputing the assignment of a member when a new target assignment is available. When the member must revoke partitions, the coordinator waits. Otherwise, it recomputes the next assignment. In other words, revoking is really blocking now.

The patch also proposes to include an explicit state in the record. It makes the implementation cleaner and it also makes it more extensible in the future.

The patch also changes the record format. This is a non-backward compatible change. I think that we should do this change to cleanup the record. As KIP-848 is only in early access in 3.7 and that we clearly state that we don't plane to support upgrade from it, this is acceptable in my opinion.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [ ] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

dajac avatar Feb 13 '24 15:02 dajac

@jeffkbkim Thanks for your comments. I have addressed them.

dajac avatar Feb 16 '24 14:02 dajac

@jeffkbkim Thanks for your comments. I have addressed all of them.

dajac avatar Feb 26 '24 13:02 dajac

This is a non-backward compatible change. I think that we should do this change to cleanup the record. As KIP-848 is only in early access in 3.7 and that we clearly state that we don't plane to support upgrade from it, this is acceptable in my opinion.

Are we gating this record change under anything?

jolshan avatar Mar 11 '24 17:03 jolshan

This is a non-backward compatible change. I think that we should do this change to cleanup the record. As KIP-848 is only in early access in 3.7 and that we clearly state that we don't plane to support upgrade from it, this is acceptable in my opinion.

Are we gating this record change under anything?

There is no gating at all for the records at the moment besides the static configuration to enable the protocol. We will introduce the proper gating based on the new feature version for 3.8. With those changes, we will clearly break the backward compatibility but, as you said, we said that upgrades won't be supported from the EA. I actually thought hard about no doing this but I think that it is preferable to do it in order to not carry on bad record format from the start.

dajac avatar Mar 12 '24 13:03 dajac

@jolshan Thanks for your comments. I addressed them.

dajac avatar Mar 12 '24 13:03 dajac