redpanda
redpanda copied to clipboard
Improved Compaction
Problem
Redpanda currently has some limitations in its compaction:
- Configuration batches are not compacted
- Segments of different terms are not merged in adjacent segment compaction
The result is that for a compacted topic in which keys are overwritten repeatedly over a long period, the on-disk state will not collapse into a small number of segments as we would expect, and the overall disk space used by a topic rewriting all its keys will not be constant as we would expect.
Solution
If all raft group members have already replicated past the point in the log where we are compacting, then it should be safe to merge segments of different terms, and also to strip out configuration batches (with appropriate updates to the offset translator).
Impact
Applications that make intensive use of compaction will no longer risk experiencing unexpectedly high consumption of disk space and system resources on redpanda nodes.