bookkeeper
bookkeeper copied to clipboard
Adjust checkpoint timing adaptively according to throughput
Descriptions of the changes in this PR:
Motivation
When using the singleentrylog mode, depending on the throughput, for example, under the throughput of 200m / s, The default value of the configuration parameter logsizelimit is 1.2g. At this time, due to the large throughput, the generation of checkpoint tasks is fast, but the execution time of checkpoint tasks is long. The speed of the two is inconsistent. A large number of tasks accumulate in the checkpoint thread, resulting in the checkpoint bits recorded in the lastmark file greatly lagging behind the real-time data. Restarting bookie will be abnormally slow. Unnecessary checkpoint tasks can be eliminated according to the execution time of checkpoint tasks. The next effective checkpoint task is generated only after one checkpoint task is executed.
Changes
The specific way is to clear the accumulated tasks in the checkpoint thread pool every time a newly generated checkpoint task is added. In this way, no matter how large the throughput is, the time to generate the checkpoint task and the time to execute the checkpoint task can match.
Master Issue: #2896
@1559924775 thanks for spotting this. However I think the proposed solution isn't quite right. I think there should be a cleaner solution without resorting to reflection.
For example, one solution might be:
- In doCheckpoint, don't submit it to the executor. Run it synchronously within the runnable.
- Change the
scheduleAtFixedRatetoschedule. Inside doCheckpoint, submit the work again usingschedule, calculating how long the delay should be based on how long it took the checkpoint to run.
There is also scheduledWithFixedDelay which would be simpler, though would always put the same delay between executions, even if a checkpoint takes a very long time.
Inactive authors, can be removed from 4.16.0 first,I think @hangc0276 @zymap
fix old workflow,please see #3455 for detail