pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix] [broker] fix incorrect delete sub when lastActive do not update.

Open thetumbled opened this issue 2 years ago • 3 comments

Motivation

This PR https://github.com/apache/pulsar/pull/17573 has fixed cases that consumer commit offset periodically. It update lastActive field of cursor when consumer commit the offset. But there is still corner case that the consumer do not commit the offset for any reason. For example, there is no content in the topic could be read, so the consumer do not commit the offset reasonably. In such case we has no reason delete the subscription. Anyway, we should be cautious with the delete operation, which could result into data duplication or loss.

Modifications

Update the lastActive field when check the inactive sub.

Verifying this change

  • [x] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/thetumbled/pulsar/pull/31

thetumbled avatar Dec 08 '23 04:12 thetumbled

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should update lastActive when doing check.

thetumbled avatar Dec 14 '23 02:12 thetumbled

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should update lastActive when doing check.

in this case, the consumer is connected, how could the code run into sub.delete https://github.com/apache/pulsar/blob/8beac8b12ef7c0ef54529fbb7e4e76c54dea6283/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2982-L2985

Technoboy- avatar Dec 21 '23 10:12 Technoboy-

why do we update the cursor when every check ?

If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should update lastActive when doing check.

in this case, the consumer is connected, how could the code run into sub.delete

https://github.com/apache/pulsar/blob/8beac8b12ef7c0ef54529fbb7e4e76c54dea6283/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2982-L2985

If the broker1 shutdown ungracefully, the topic will be loaded on broker2 without update the lastActive field to zk, and the broker2 load the topic with old lastActive field from zk. If broker2 check the inactive sub before any consumer connect to this sub, the sub will be deleted. This problem is simillar to https://github.com/apache/pulsar/pull/17573.

thetumbled avatar Dec 22 '23 06:12 thetumbled

@thetumbled Could you add a test for this case? Thanks

I push a new commit to add the test code, PTAL, thanks.

thetumbled avatar May 28 '24 09:05 thetumbled

PTAL, thanks. @dao-jun @lhotari @Technoboy- @poorbarcode @coderzc @codelipenghui

thetumbled avatar May 28 '24 09:05 thetumbled

closed as https://github.com/apache/pulsar/pull/22794 has been merged.

thetumbled avatar Jun 03 '24 02:06 thetumbled