redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Improved Compaction

Open jcsp opened this issue 2 years ago • 0 comments

Problem

Redpanda currently has some limitations in its compaction:

  • Configuration batches are not compacted
  • Segments of different terms are not merged in adjacent segment compaction

The result is that for a compacted topic in which keys are overwritten repeatedly over a long period, the on-disk state will not collapse into a small number of segments as we would expect, and the overall disk space used by a topic rewriting all its keys will not be constant as we would expect.

Solution

If all raft group members have already replicated past the point in the log where we are compacting, then it should be safe to merge segments of different terms, and also to strip out configuration batches (with appropriate updates to the offset translator).

Impact

Applications that make intensive use of compaction will no longer risk experiencing unexpectedly high consumption of disk space and system resources on redpanda nodes.

jcsp avatar Sep 16 '22 13:09 jcsp