neon pageserver: handle WAL gaps on sharded tenants

Problem

In the test for https://github.com/neondatabase/neon/pull/6776, a test cases uses tiny layer sizes and tiny stripe sizes. This hits a scenario where a shard's checkpoint interval spans a region where none of the content in the WAL is ingested by this shard. Since there is no layer to flush, we do not advance disk_consistent_lsn, and this causes the test to fail while waiting for LSN to advance.

Summary of changes

Pass an LSN through layer_flush_start_tx. This is the LSN to which we have frozen at the time we ask the flush to flush layers frozen up to this point.
In the layer flush task, if the layers we flush do not reach frozen_to_lsn, then advance disk_consistent_lsn up to this point.

The net effect is that the disk_consistent_lsn is allowed to advance past regions in the WAL where a shard ingests no data.

Checklist before requesting a review

[ ] I have performed a self-review of my code.
[ ] If it is a core feature, I have added thorough tests.
[ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
[ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

[ ] Do not forget to reformat commit message to not include the above checklist

Feb 16 '24 14:02 jcsp

2754 tests run: 2630 passed, 0 failed, 124 skipped (full report)

Flaky tests (4)

Postgres 16

test_compute_pageserver_connection_stress: release
test_null_config: release
test_deletion_queue_recovery[no-validate-lose]: debug
test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

functions: 28.0% (6396 of 22867 functions)
lines: 46.8% (45023 of 96104 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
4e3ea61501ad92b1a3e77d76d50e1f61e0f74121 at 2024-04-04T17:07:46.062Z :recycle:}

Feb 16 '24 15:02 github-actions[bot]

One of the reasons we didn't see this more widely is that nonzero shards still often do SLRU writes during ingest, even if they're not ingesting any of the data in a WAL record. This will probably be easier to test if I also make the change to only ingest SLRU writes on shard zero.

Mar 12 '24 18:03 jcsp

Changes in addition to review comments:

Making a stable test for this has highlighted the need to skip SLRU and checkpoint content on >0 shards, otherwise they're far too often getting extra writes that prevent is exploring the scenarios where a shard doesn't ingest anything within a region of the WAL.
Just advancing disk_consistent_lsn on layer flush wasn't sufficient: we also need to advance it when there is no layer to flush, and to advance remote_consistent_lsn as well. That is added in a second commit.

Apr 03 '24 16:04 jcsp

I backed out the changes to ingest logic, and loosened the test slightly to tolerate that. The ingest changes were kind of ugly/fragile, and in any case it was incorrect to drop all SLRU content on shards >0, because they relied on it for GC bound calculation.

Apr 04 '24 13:04 jcsp

neon neon copied to clipboard

pageserver: handle WAL gaps on sharded tenants

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

2754 tests run: 2630 passed, 0 failed, 124 skipped (full report)

Postgres 16

Code coverage* (full report)

neon
neon copied to clipboard