neon
neon copied to clipboard
pageserver: handle WAL gaps on sharded tenants
Problem
In the test for https://github.com/neondatabase/neon/pull/6776, a test cases uses tiny layer sizes and tiny stripe sizes. This hits a scenario where a shard's checkpoint interval spans a region where none of the content in the WAL is ingested by this shard. Since there is no layer to flush, we do not advance disk_consistent_lsn, and this causes the test to fail while waiting for LSN to advance.
Summary of changes
- Pass an LSN through
layer_flush_start_tx
. This is the LSN to which we have frozen at the time we ask the flush to flush layers frozen up to this point. - In the layer flush task, if the layers we flush do not reach
frozen_to_lsn
, then advance disk_consistent_lsn up to this point.
The net effect is that the disk_consistent_lsn is allowed to advance past regions in the WAL where a shard ingests no data.
Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist
2754 tests run: 2630 passed, 0 failed, 124 skipped (full report)
Flaky tests (4)
Postgres 16
Code coverage* (full report)
-
functions
:28.0% (6396 of 22867 functions)
-
lines
:46.8% (45023 of 96104 lines)
* collected from Rust tests only
4e3ea61501ad92b1a3e77d76d50e1f61e0f74121 at 2024-04-04T17:07:46.062Z :recycle:
One of the reasons we didn't see this more widely is that nonzero shards still often do SLRU writes during ingest, even if they're not ingesting any of the data in a WAL record. This will probably be easier to test if I also make the change to only ingest SLRU writes on shard zero.
Changes in addition to review comments:
- Making a stable test for this has highlighted the need to skip SLRU and checkpoint content on >0 shards, otherwise they're far too often getting extra writes that prevent is exploring the scenarios where a shard doesn't ingest anything within a region of the WAL.
- Just advancing disk_consistent_lsn on layer flush wasn't sufficient: we also need to advance it when there is no layer to flush, and to advance remote_consistent_lsn as well. That is added in a second commit.
I backed out the changes to ingest logic, and loosened the test slightly to tolerate that. The ingest changes were kind of ugly/fragile, and in any case it was incorrect to drop all SLRU content on shards >0, because they relied on it for GC bound calculation.