bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

BP-67: Support skipping compaction at busy times

Open thetumbled opened this issue 1 year ago • 5 comments

Proposal PR for BP-67. Master Issue: https://github.com/apache/bookkeeper/issues/4382 Implementation PR: https://github.com/apache/bookkeeper/pull/4385

thetumbled avatar May 22 '24 10:05 thetumbled

PTAL, thanks. @shoothzj @eolivelli @dlg99 @horizonzy

thetumbled avatar May 22 '24 12:05 thetumbled

For example, the cpu.idle will decrease significantly while doing compaction. image

thetumbled avatar May 23 '24 02:05 thetumbled

I am rather indifferent about the change. There was an attempt in the to implement something similar: #851

It was abandoned, as I remember, because the company where the author worked had chosen to use REST API and cron to run enable/disable compaction.

So you already can disable the compaction on schedule just not via bookie configuration.

I'd support a mechanism that would dynamically pause/resume major/minor compaction if the load on bookie is high (e.g. as tracked by pending requests or request queue latency) if the disk space is enough.

To support dynamically pause/resume major/minor compaction, we need to introduce two interface:

  • LoadStatCollector collect the load stat for other modules.
  • CompactionStrategy decide whether do compaction based on load and any other stats, conf. WDYT @dlg99 @eolivelli @shoothzj

thetumbled avatar May 24 '24 09:05 thetumbled

I am rather indifferent about the change. There was an attempt in the to implement something similar: #851 It was abandoned, as I remember, because the company where the author worked had chosen to use REST API and cron to run enable/disable compaction. So you already can disable the compaction on schedule just not via bookie configuration. I'd support a mechanism that would dynamically pause/resume major/minor compaction if the load on bookie is high (e.g. as tracked by pending requests or request queue latency) if the disk space is enough.

To support dynamically pause/resume major/minor compaction, we need to introduce two interface:

  • LoadStatCollector collect the load stat for other modules.
  • CompactionStrategy decide whether do compaction based on load and any other stats, conf. WDYT @dlg99 @eolivelli @shoothzj

@thetumbled suspend and resume compaction based on load will make the system more complex and hard to configure a reasonable threshold. Moreover, if we debug the removed ledger not cleaned up, it's hard to explain why the compaction service was suspended. To reduce the compaction impact in bookie service, we can:

  • Introduce accurate throttle for compaction service
  • Avoid or suspend the compaction service in high-traffic time

hangc0276 avatar May 26 '24 01:05 hangc0276

There is one point need to be discussed: we should check it at the beginning of the compaction, but should we check it each time before we try to compact a new entry log? If we stop the compaction at the running time, should we resume this major/minor compaction or just start a new compaction? For example, if the major compaction departs at 0:00 every day and the skip time range is set to 1-3 o'clock, the major will be suspended as soon as it departs, and the major compaction cannot be completed. For users who want to do a complete major compaction, it will become one problem. @hangc0276 @eolivelli @shoothzj @horizonzy

thetumbled avatar May 29 '24 03:05 thetumbled