pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] refactor cursor read entry process to fix dead loop read issue of txn

Open TakaHiR07 opened this issue 8 months ago • 3 comments

Motivation

  1. FIx the issue https://github.com/apache/pulsar/issues/22943. This is issue is serious and actually cause txn unavailable.

This pr is similar to a previous pr https://github.com/apache/pulsar/pull/14286. Since previous pr is closed, I implement it in master branch and improve some logic and add some test.

  1. Besides, this pr would also related to another issue, which is also need to be improved. https://github.com/apache/pulsar/issues/23027

Modifications

  1. Add a field named maxReadPosition in ManagedLedgerImpl and if read op enable maxReadPosition, add check hasMoreEntriesByMaxReadPosition()
  2. Add a field named waitingCursorsByMaxReadPosition in ManagedLedgerImpl, when cursor has read op in wait state, we can put this read op in to this queue. If maxReadPosition updated, we will pool it and notify this read op.
  3. In topicTransaction buffer, when any updated maxReadposition op we should sync it to ManagedLedgerImpl maxReadPosition.

Currently :

  • If readPosition <= lastConfirmedEntry && readPosition <= maxReadPosition , read immediately
  • If readPosition <= lastConfirmedEntry && readPosition > maxReadPosition , wait by max read position
  • If readPosition > lastConfirmedEntry , wait by cursor

I make many comments in the code, which maybe the concerned point.

And this pr try to retain the same process as before if disable txn. Aiming to fix the issue after enable txn.

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository:

TakaHiR07 avatar Jun 19 '24 12:06 TakaHiR07