kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-17442: Handled persister errors with write async calls (KIP-932)

Open apoorvmittal10 opened this issue 1 year ago • 1 comments

The PR makes the persister write RPC async. Also handles the errors from persister as per the review comment here: Addressing review comment for PR: https://github.com/apache/kafka/pull/16397#discussion_r1725454927

The read persister RPC is still sync which shall be addressed in next PR as that needs additional handling.

The error returned from persister are sent to Share Partition Manager to take the appropriate action. Once leader epoch handling is implemented then we can differentiate between STATE_EPOCH VS NOT_LEADER errors.

Committer Checklist (excluded from commit message)

  • [ ] Verify design and implementation
  • [ ] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

apoorvmittal10 avatar Aug 22 '24 00:08 apoorvmittal10

Thanks for the PR. I had a query, what will happen to the ongoing write state requests to the persister when you close a share partition. I understand that it is fine if some write state calls are not completed, but will cause any problems while close of SharePartitionManager?

adixitconfluent avatar Aug 29 '24 14:08 adixitconfluent

Thanks for the PR. I had a query, what will happen to the ongoing write state requests to the persister when you close a share partition. I understand that it is fine if some write state calls are not completed, but will cause any problems while close of SharePartitionManager?

The broker must wait for the pending requests to be completed, so I think it should not be problem unless persister never replies back.

apoorvmittal10 avatar Aug 30 '24 11:08 apoorvmittal10

@junrao Can I please get review on this.

apoorvmittal10 avatar Aug 30 '24 11:08 apoorvmittal10

@apoorvmittal10 : Thanks for the PR. The code LGTM. Are the test failure related? If not, have they all been tracked?

Unrelated tests failure, I have triaged the failures. I saw 1 failing for PlaintextAdminIntegrationTest.testShareGroups and verified locally that test is flaky, triaged that.

apoorvmittal10 avatar Sep 01 '24 20:09 apoorvmittal10