[fix] [broker] fix incorrect delete sub when lastActive do not update.
Motivation
This PR https://github.com/apache/pulsar/pull/17573 has fixed cases that consumer commit offset periodically. It update lastActive field of cursor when consumer commit the offset.
But there is still corner case that the consumer do not commit the offset for any reason. For example, there is no content in the topic could be read, so the consumer do not commit the offset reasonably. In such case we has no reason delete the subscription.
Anyway, we should be cautious with the delete operation, which could result into data duplication or loss.
Modifications
Update the lastActive field when check the inactive sub.
Verifying this change
- [x] Make sure that the change passes the CI checks.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [x]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/thetumbled/pulsar/pull/31
why do we update the cursor when every check ?
If the consumer do not commit the offset for any reason, the lastActive will never be updated, which will result into incorrect subscription deletion.
As we know that the consumer is connected, we should update lastActive when doing check.
why do we update the cursor when every check ?
If the consumer do not commit the offset for any reason, the
lastActivewill never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should updatelastActivewhen doing check.
in this case, the consumer is connected, how could the code run into sub.delete
https://github.com/apache/pulsar/blob/8beac8b12ef7c0ef54529fbb7e4e76c54dea6283/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2982-L2985
why do we update the cursor when every check ?
If the consumer do not commit the offset for any reason, the
lastActivewill never be updated, which will result into incorrect subscription deletion. As we know that the consumer is connected, we should updatelastActivewhen doing check.in this case, the consumer is connected, how could the code run into
sub.deletehttps://github.com/apache/pulsar/blob/8beac8b12ef7c0ef54529fbb7e4e76c54dea6283/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2982-L2985
If the broker1 shutdown ungracefully, the topic will be loaded on broker2 without update the lastActive field to zk, and the broker2 load the topic with old lastActive field from zk. If broker2 check the inactive sub before any consumer connect to this sub, the sub will be deleted.
This problem is simillar to https://github.com/apache/pulsar/pull/17573.
@thetumbled Could you add a test for this case? Thanks
I push a new commit to add the test code, PTAL, thanks.
PTAL, thanks. @dao-jun @lhotari @Technoboy- @poorbarcode @coderzc @codelipenghui
closed as https://github.com/apache/pulsar/pull/22794 has been merged.