bookkeeper Adjust checkpoint timing adaptively according to throughput

Adjust checkpoint timing adaptively according to throughput

Open 1559924775 opened this issue 4 years ago • 3 comments

Descriptions of the changes in this PR:

Motivation

When using the singleentrylog mode, depending on the throughput, for example, under the throughput of 200m / s, The default value of the configuration parameter logsizelimit is 1.2g. At this time, due to the large throughput, the generation of checkpoint tasks is fast, but the execution time of checkpoint tasks is long. The speed of the two is inconsistent. A large number of tasks accumulate in the checkpoint thread, resulting in the checkpoint bits recorded in the lastmark file greatly lagging behind the real-time data. Restarting bookie will be abnormally slow. Unnecessary checkpoint tasks can be eliminated according to the execution time of checkpoint tasks. The next effective checkpoint task is generated only after one checkpoint task is executed.

Changes

The specific way is to clear the accumulated tasks in the checkpoint thread pool every time a newly generated checkpoint task is added. In this way, no matter how large the throughput is, the time to generate the checkpoint task and the time to execute the checkpoint task can match.

Master Issue: #2896

Nov 26 '21 07:11 1559924775

@1559924775 thanks for spotting this. However I think the proposed solution isn't quite right. I think there should be a cleaner solution without resorting to reflection.

For example, one solution might be:

In doCheckpoint, don't submit it to the executor. Run it synchronously within the runnable.
Change the scheduleAtFixedRate to schedule. Inside doCheckpoint, submit the work again using schedule, calculating how long the delay should be based on how long it took the checkpoint to run.

There is also scheduledWithFixedDelay which would be simpler, though would always put the same delay between executions, even if a checkpoint takes a very long time.

Dec 02 '21 09:12 Vanlightly

Inactive authors, can be removed from 4.16.0 first,I think @hangc0276 @zymap

Jul 27 '22 16:07 StevenLuMT

fix old workflow,please see #3455 for detail

Aug 24 '22 08:08 StevenLuMT

bookkeeper bookkeeper copied to clipboard

Adjust checkpoint timing adaptively according to throughput

Motivation

Changes

bookkeeper
bookkeeper copied to clipboard