neon
neon copied to clipboard
Correctly truncate VM
Problem
https://github.com/neondatabase/neon/issues/9240
Summary of changes
Correctly truncate VM page instead just replacing it with zero page.
Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist
5417 tests run: 5194 passed, 3 failed, 220 skipped (full report)
Failures on Postgres 17
test_sharding_split_failures[failure4]: debug-x86-64test_sharding_split_failures[failure7]: debug-x86-64test_sharding_split_failures[failure8]: debug-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharding_split_failures[debug-pg17-failure4] or test_sharding_split_failures[debug-pg17-failure7] or test_sharding_split_failures[debug-pg17-failure8]"
Flaky tests (9)
Postgres 17
test_sharding_split_failures[failure2]: debug-x86-64test_sharding_split_failures[failure13]: debug-x86-64test_sharding_split_failures[failure15]: debug-x86-64test_sharding_split_failures[failure14]: debug-x86-64test_sharding_split_smoke: debug-x86-64test_sharding_split_compaction[compact-shard-ancestors-persistent]: debug-x86-64test_create_churn_during_restart: debug-x86-64
Postgres 15
test_create_churn_during_restart: release-x86-64
Postgres 14
test_get_tenant_size_with_multiple_branches: release-x86-64
Code coverage* (full report)
functions:31.8% (7889 of 24834 functions)lines:49.4% (62436 of 126295 lines)
* collected from Rust tests only
7303d05fd55b42ae9082a282ebc611e27e06852b at 2024-11-13T08:53:39.675Z :recycle:
test_pg_regress times out now. Do you know why that can happen?
I copied some more comments from the corresponding postgres function here.
If it's difficult to write tests using just SQL commands, one approach is to write a bespoken little C test extension that explicitly calls smgrtruncate() to truncate the VM etc. In this case, I don't think it's too hard to hit all the codepaths from SQL either though. Or you can do both; more tests is generally better.
If it's difficult to write tests using just SQL commands, one approach is to write a bespoken little C test extension that explicitly calls smgrtruncate() to truncate the VM etc. In this case, I don't think it's too hard to hit all the codepaths from SQL either though. Or you can do both; more tests is generally better.
It is not a problem to create zero page at PS (caused by VM truncation). The problem is how to cause clearing of all-visible bit for this page because prior to clear all-visible we first need to set all-visible and somehow prevent this change from been propagated to PS.
I see only one way how this error con be reproduced - please look at RelationTruncate:
/* Prepare for truncation of the visibility map too if it exists */
vm = smgrexists(RelationGetSmgr(rel), VISIBILITYMAP_FORKNUM);
if (vm)
{
blocks[nforks] = visibilitymap_prepare_truncate(rel, nblocks);
if (BlockNumberIsValid(blocks[nforks]))
{
forks[nforks] = VISIBILITYMAP_FORKNUM;
nforks++;
}
}
...
if (RelationNeedsWAL(rel))
{
...
lsn = XLogInsert(RM_SMGR_ID,
XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
}
It first update VM page in visibilitymap_prepare_truncate and then wan-log this truncation.
If VM page is throw away between this two events, then we actually have zeroed page in PS.
But it seems tone impossible toreliably reproduce it without patch Postgres code in adding fail point in this place.
I see only one way how this error con be reproduced - please look at
RelationTruncate:/* Prepare for truncation of the visibility map too if it exists */ vm = smgrexists(RelationGetSmgr(rel), VISIBILITYMAP_FORKNUM); if (vm) { blocks[nforks] = visibilitymap_prepare_truncate(rel, nblocks); if (BlockNumberIsValid(blocks[nforks])) { forks[nforks] = VISIBILITYMAP_FORKNUM; nforks++; } } ... if (RelationNeedsWAL(rel)) { ... lsn = XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE); }It first update VM page in
visibilitymap_prepare_truncateand then wan-log this truncation. If VM page is throw away between this two events, then we actually have zeroed page in PS. But it seems tone impossible toreliably reproduce it without patch Postgres code in adding fail point in this place.
I don't follow. If the page is thrown away between modifying the VM page and WAL-logging, then the pageserver would still contain the old non.-zero version of the VM page, and there would be no truncation in the pageserver.
I don't follow. If the page is thrown away between modifying the VM page and WAL-logging, then the pageserver would still contain the old non.-zero version of the VM page, and there would be no truncation in the pageserver.
We wallog all dirty VM/FSM pages when they are evicted. So what will happen in most cases on VM truncations:
- VM page is update on compute
- SMGR_TRUNCATE record is appended to the WAL
- VM pages is written and FPI is appended to the WAL
So in PS for this VM page we have sequence: SMGR_TRUNCATE,FPI after applying which we will get non-zero page. But if eviction happened between 1 and 2, then we will get:
- VM page is update on compute
- VM pages is written and FPI is appended to the WAL
- SMGR_TRUNCATE record is appended to the WAL
In this case the sequence will be FPI,SDMGR_TRUNCATE which produce zero VM page.
I tried also "brute force" truing to reproduce the problem not deterministically, but more or less reliably, by running many concurrent tasks performing relation truncation:
import threading
from fixtures.neon_fixtures import NeonEnv
#
# Test that VM is properly truncated
#
def test_vm_truncate(neon_simple_env: NeonEnv):
env = neon_simple_env
n_iterations = 100
n_threads = 16
threads = []
endpoint = env.endpoints.create_start("main")
con = endpoint.connect()
cur = con.cursor()
cur.execute("CREATE EXTENSION neon_test_utils")
def truncate(tid):
con = endpoint.connect()
cur = con.cursor()
cur.execute("set enable_seqscan=off")
cur.execute(f"create table t{tid}(pk integer primary key, counter integer default 0, filler text default repeat('?', 200))")
for i in range(n_iterations):
cur.execute(f"insert into t{tid} (pk) values (generate_series(1,1000))")
cur.execute(f"delete from t{tid} where pk>1")
cur.execute(f"vacuum t{tid}")
cur.execute(f"update t{tid} set counter=counter+1")
cur.execute("SELECT clear_buffer_cache()")
cur.execute(f"select count(*) from t{tid}")
cur.execute(f"delete from t{tid}")
for i in range(n_threads):
t = threading.Thread(target=truncate, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
... but still without any success.
I added test which just inspect content of VM page and checks that truncation is performed correctly.