neon icon indicating copy to clipboard operation
neon copied to clipboard

pageserver: getpage requests sometimes skip reading recently written image layers

Open jcsp opened this issue 1 year ago • 2 comments
trafficstars

Via investigation of https://github.com/neondatabase/neon/issues/9058 -- in that issue, it was observed that layers before recently written image layers were being visited by getpage requests.

It seems like under some circumstances, a getpage request to the exact same LSN where an image layer exists can fail to hit that image layer. Not clear if being at the exact same LSN is important or not: it might just be that we don't hit image layers for reads until the current in memory layer is closed?

Lots of uncertainty here, not claiming to have conclusively diagnosed this

Branch with experimental test: https://github.com/neondatabase/neon/tree/jcsp/layer-map-search-at-image-lsn-2

In that branch, there are some log lines hacked in to record which layers are visited at INFO level. In the test, there is a checkpoint line commented out:

    # Uncomment this checkpoint, and the logs will show getpage requests hitting the image layers we
    # just created.  However, without the checkpoint, getpage requests will hit one InMemoryLayer and
    # one persistent delta layer.
    # env.pageserver.http_client().timeline_checkpoint(tenant_id, timeline_id, wait_until_uploaded=True)

The presence or absence of inmemory layers shouldn't make any difference to whether reads hit an image layer, but apparently it does.

jcsp avatar Sep 27 '24 17:09 jcsp

A cleaner reproducer that uses layer eviction + on-demand downloads to prove which layers are touched by a getpage request: https://github.com/neondatabase/neon/tree/jcsp/layer-map-search-at-image-lsn-3

This test does reads at exactly the LSN of the image layer, but I can also reproduce the issue with some writes between generating the image layer and doing the read, so this is not something that only occurs when reading exactly at the image layer's LSN. I suspect our reads are skipping the image layer until the next time we freeze the ephemeral layer.

jcsp avatar Oct 14 '24 16:10 jcsp

Perhaps this piece of logic is at fault in get_vectored_reconstruct_data_timeline:

                match in_memory_layer {
                    Some(l) => {
                        let lsn_range = l.get_lsn_range().start..cont_lsn;
                        fringe.update(
                            ReadableLayer::InMemoryLayer(l),
                            unmapped_keyspace.clone(),
                            lsn_range,
                        );
                    }

...because lsn_range is being constructed from the absolute start of the layer. Our cont_lsn jumps back to the start of the oldest inmemory layer before we start looking at historic layers at all.

jcsp avatar Oct 15 '24 12:10 jcsp