neon deal with wal records bigger than wal_seg

Sep 06 '21 21:09 kelvich

Got some context on this? Where do we not handle them correctly?

Sep 16 '21 19:09 hlinnaka

That was about find_end_of_wal() in the safekeeper -- when it faces continuation record it tries to iterate backwards on WAL to reconstruct whole record and check CRC and assumes that record starts in that previous segment. IIUC there is a code to that fixes that behavior for cases when we need deeper look behind: https://github.com/zenithdb/zenith/commits/safe_flush_ptr That issue about extracting and testing that

Sep 17 '21 07:09 kelvich

cc @petuhovskiy , you've mentioned that you're testing this already, but the test hadn't failed :) can you share some more details? thanks!

Jan 10 '22 14:01 stepashka

cc @petuhovskiy , you've mentioned that you're testing this already, but the test hadn't failed :) can you share some more details? thanks!

https://github.com/zenithdb/zenith/commit/4468797a237ae500e735f702ffcb63acfca47a89 This is the commit with the test, that wasn't failing for me at the time, one can look at it to create failing test

Jan 10 '22 14:01 petuhovskiy

I'm wondering if there is some bug in find_end_of_wal_segment's logic here? In my test xlp_rem_len in adjacent pages in a partial WAL record differ by either 8152 bytes (8192 - 40 = XLOG_BLCKSZ - XLOG_SIZE_OF_XLG_LONG_PHD) 8168 bytes (8192 - 24 = XLOG_BLCKSZ - XLOG_SIZE_OF_XLG_SHORT_PHD), not 8192 exactly. So it seems to me that xlp_rem_len does not account for page headers, while the code assumes it does and it's safe to just offset inside the WAL file instead of re-reading all pages/blocks (not sure what's the right term here).

So my test does not actually need to make records bigger than wal_seg_size, bigger than XLOG_BLCKSZ is already enough.

Needs further investigation.

Feb 05 '22 16:02 yeputons

Progress so far:

[x] Add some corner case unit tests for find_end_of_wal
[x] Correctly skip contrecords in the last segment
[x] Handle WAL ending in a big multi-segment record. That requires reading previous segments (only the first few bytes of each, though) if we want to validate this last record.
[x] Address bunch of review comments from #1574 (including my own, one can even craft a test case for it), they may reveal more bugs and corner cases. Maybe reuse of Pageserver's code is a good idea, unless it does lots of parsing which we want to avoid in Safekeeper.

May 20 '22 15:05 yeputons

neon
neon copied to clipboard

deal with wal records bigger than wal_seg_size

neon neon copied to clipboard

deal with wal records bigger than wal_seg_size

neon
neon copied to clipboard