neon Epic: pageserver image layer compression

Background

We may substantially decrease the capacity & bandwidth footprint of tenants by compressing data in their image layers.

There are many possible implementations, from compressing whole layers files as streams, to introducing some chunked format and decompressing a chunk at a time, to simply compressing individual pages.

Compressing individual pages in image layers is by far the simplest thing to do, and should have a high payoff as:

image layers are often the majority of a tenant's storage footprint.
image layers provide 8kib pages that should be large enough to meaningfully compress.

Compressing deltas is a harder problem (individual deltas are likely too small to usefully compress), and is left as a possible future change.

Implementation

There is a preliminary version here: https://github.com/neondatabase/neon/pull/7091, which demonstrates that per-page compression in image layers may be added as a relatively lightweight code change.

To get this ready for production, there is more work to do:

[x] Evaluate compression algorithms on realistic datasets. We should analyze:
- zstd
- LZ4
- zstd/LZ4 plus dictionaries: we could craft a dictionary-per-layer to get better compression of each page in the layer.
- Pay particular attention to read performance: this is the part that will be in the hot path for getpage latency.
[x] Revise page header format to enable stashing compression flags -- we currently have a four byte header which is gratuitously large, and we should be able to store compression info in there without adding more header bytes (discussed at https://github.com/neondatabase/neon/pull/7091#discussion_r1521750331)
[ ] Handle compressed user data efficiently: if the user's data is already compressed, we should detect that and avoid re-compressing it on the pageserver (discussed at https://github.com/neondatabase/neon/pull/7091#discussion_r1521803603)
[x] Define a phased roll-out approach: there maybe significantly more CPU load once compression is in use.

PRs/issues

[x] #7852
[x] https://github.com/neondatabase/neon/pull/7879 ---> experiment was done outside of main
[x] #8106
[x] #8225 -- not needed in the end
[x] #8238
[x] #8252
[x] #8257
[x] #8265
[x] #8281
[x] #8291
[x] #8288 --> test with broken non-vectored decompression. large amount of breakage ensured that testsuite actually relies on it to work.
[x] #8324 --> test with broken vectored blob reading, whether compressed or not. large amount of breakage ensured that testsuite actually relies on it to work.
[x] #8300
[x] #8302
[x] #8363
[x] #8368
[x] https://github.com/neondatabase/neon/pull/8420 by @jcsp
[x] https://github.com/neondatabase/neon/pull/8522

Rollout

[x] https://github.com/neondatabase/aws/pull/1612
[x] https://github.com/neondatabase/aws/pull/1624
[x] https://github.com/neondatabase/azure/pull/284
[x] https://github.com/neondatabase/aws/pull/1636
[x] https://github.com/neondatabase/aws/pull/1710
[x] https://github.com/neondatabase/neon/pull/8677
[ ] https://github.com/neondatabase/aws/pull/1744
[ ] https://github.com/neondatabase/azure/pull/314

Oct 02 '23 08:10 jcsp

Last week:

arpad wrote a tool to compress image layers #7879

This week:

identify interesting / representative tenants / layers
determine achievable space savings by running the tool against the identified layers

May 27 '24 13:05 problame

This week:

implement decompression
compare decompression speed
have a meeting with Konstantin, Stas, and John later this week
- which algorithm is chosen right now

Jun 10 '24 13:06 koivunej

Week Jul 1-5:

big implementation week, filed many PRs:
- #8238
- #8252
- #8257
- #8265
- #8281
- #8291
we support configuring compression now via the image_compression config param.
wrote scrubber subcommand to look for image layers > 250M bytes.
ran this scrubber subcommand against prod. there were none such image layers, this implies that there is no blob above that limit either, which allows us to use the bits encoding such large blobs for different purposes.

Week Jul 8-12:

deployed release now forbids writing blobs >=256MiB, both to image and delta layers.
ran scrubber again after the release, to ensure no blob >=256MiB was added in the window between the first scrubber run and the release, according to Christian's plan: https://github.com/neondatabase/neon/pull/8238#issuecomment-2206472870
big testing week. debugged in:
- #8288
- #8324
got PRs from last week merged:
- #8300
- #8302
testing found settings passing oversight. filed #8238 for it and got it merged.
New testing PR: #8368

Jul 12 '24 16:07 arpad-m

From @Bodobolero's benchmarks: add lz4 support for comparison.

Aug 26 '24 13:08 koivunej

we talked about this in the call and agreed that until further investigation in which compression is identified as culprit, we will not spend developer time on this.

Aug 26 '24 15:08 arpad-m

I think this can be closed now.

Sep 02 '24 12:09 arpad-m

neon neon copied to clipboard

Epic: pageserver image layer compression

Background

Implementation

PRs/issues

Rollout

Week Jul 1-5:

Week Jul 8-12:

neon
neon copied to clipboard