neon icon indicating copy to clipboard operation
neon copied to clipboard

pageserver: during compaction, write image layers if it will enable physical space reduction

Open jcsp opened this issue 1 year ago • 0 comments

Background

The gc_feedback mechanism removed in https://github.com/neondatabase/neon/pull/6863 is meant to protect against edge cases where repeated keyspace repartitioning can result in stacks of deltas that are never fully covered by image layers, and therefore never get GC'd.

The history as I understand it is:

  • https://github.com/neondatabase/neon/pull/3673 added wanted_image_layers mechanism to let GC request image layer generation.
  • It was realized that doing image layer generation on each GC cycle was very wasteful
  • https://github.com/neondatabase/neon/pull/4353 added the gc_feedback tenant config to turn feedback off, and it has been off by default since then.

Purpose

This ticket tracks creating an improved mechanism to ensure that:

  1. Long-idle timelines are proactively compacted into image layers to reduce storage space.
  2. Edge case "gaps" in image layer coverage in compaction do not result in keeping old delta layers forever.
  3. Such proactive image layer generation must not result in non-root timelines copying large proportions of the parent timeline's data (i.e. preserve CoW behavior).
  4. Proactive image layer generation should not closely track the GC horizon, to avoid continuously generating new image layers as the GC horizon advances. It should also not continuously generate image layers if someone sets the pitr interval to 0.

The previous gc_feedback mechanism was not widely used because it satisfied 1 & 2 but not 3 & 4.

A replacement mechanism might not need to involve the GC code -- we can directly query the layer map during compaction and:

  • do a calculation of how much delta layer space can be recovered by writing an image layer for a particular partition, and make a decision on whether to generate the image layer based on this ratio. That should satisfy requirement 3.
  • during compaction, the age threshold for trying to cover a delta layer doesn't have to be the same as the PITR interval: we might e.g. have a 1 day PITR interval but only bother doing proactive image layer generation after 7 days. That should satisfy requirement 4.

jcsp avatar Feb 23 '24 15:02 jcsp