neon icon indicating copy to clipboard operation
neon copied to clipboard

Correctly truncate VM

Open knizhnik opened this issue 1 year ago • 6 comments

Problem

https://github.com/neondatabase/neon/issues/9240

Summary of changes

Correctly truncate VM page instead just replacing it with zero page.

Checklist before requesting a review

  • [ ] I have performed a self-review of my code.
  • [ ] If it is a core feature, I have added thorough tests.
  • [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • [ ] Do not forget to reformat commit message to not include the above checklist

knizhnik avatar Oct 10 '24 06:10 knizhnik

5417 tests run: 5194 passed, 3 failed, 220 skipped (full report)


Failures on Postgres 17

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharding_split_failures[debug-pg17-failure4] or test_sharding_split_failures[debug-pg17-failure7] or test_sharding_split_failures[debug-pg17-failure8]"
Flaky tests (9)

Postgres 17

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.8% (7889 of 24834 functions)
  • lines: 49.4% (62436 of 126295 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7303d05fd55b42ae9082a282ebc611e27e06852b at 2024-11-13T08:53:39.675Z :recycle:

github-actions[bot] avatar Oct 10 '24 07:10 github-actions[bot]

test_pg_regress times out now. Do you know why that can happen?

arpad-m avatar Oct 10 '24 08:10 arpad-m

I copied some more comments from the corresponding postgres function here.

hlinnaka avatar Oct 10 '24 10:10 hlinnaka

If it's difficult to write tests using just SQL commands, one approach is to write a bespoken little C test extension that explicitly calls smgrtruncate() to truncate the VM etc. In this case, I don't think it's too hard to hit all the codepaths from SQL either though. Or you can do both; more tests is generally better.

hlinnaka avatar Oct 18 '24 08:10 hlinnaka

If it's difficult to write tests using just SQL commands, one approach is to write a bespoken little C test extension that explicitly calls smgrtruncate() to truncate the VM etc. In this case, I don't think it's too hard to hit all the codepaths from SQL either though. Or you can do both; more tests is generally better.

It is not a problem to create zero page at PS (caused by VM truncation). The problem is how to cause clearing of all-visible bit for this page because prior to clear all-visible we first need to set all-visible and somehow prevent this change from been propagated to PS.

knizhnik avatar Oct 18 '24 17:10 knizhnik

I see only one way how this error con be reproduced - please look at RelationTruncate:

	/* Prepare for truncation of the visibility map too if it exists */
	vm = smgrexists(RelationGetSmgr(rel), VISIBILITYMAP_FORKNUM);
	if (vm)
	{
		blocks[nforks] = visibilitymap_prepare_truncate(rel, nblocks);
		if (BlockNumberIsValid(blocks[nforks]))
		{
			forks[nforks] = VISIBILITYMAP_FORKNUM;
			nforks++;
		}
	}
        ...
	if (RelationNeedsWAL(rel))
	{
                ...
		lsn = XLogInsert(RM_SMGR_ID,
						 XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
	}

It first update VM page in visibilitymap_prepare_truncate and then wan-log this truncation. If VM page is throw away between this two events, then we actually have zeroed page in PS. But it seems tone impossible toreliably reproduce it without patch Postgres code in adding fail point in this place.

knizhnik avatar Oct 20 '24 12:10 knizhnik

I see only one way how this error con be reproduced - please look at RelationTruncate:

	/* Prepare for truncation of the visibility map too if it exists */
	vm = smgrexists(RelationGetSmgr(rel), VISIBILITYMAP_FORKNUM);
	if (vm)
	{
		blocks[nforks] = visibilitymap_prepare_truncate(rel, nblocks);
		if (BlockNumberIsValid(blocks[nforks]))
		{
			forks[nforks] = VISIBILITYMAP_FORKNUM;
			nforks++;
		}
	}
        ...
	if (RelationNeedsWAL(rel))
	{
                ...
		lsn = XLogInsert(RM_SMGR_ID,
						 XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
	}

It first update VM page in visibilitymap_prepare_truncate and then wan-log this truncation. If VM page is throw away between this two events, then we actually have zeroed page in PS. But it seems tone impossible toreliably reproduce it without patch Postgres code in adding fail point in this place.

I don't follow. If the page is thrown away between modifying the VM page and WAL-logging, then the pageserver would still contain the old non.-zero version of the VM page, and there would be no truncation in the pageserver.

hlinnaka avatar Oct 21 '24 17:10 hlinnaka

I don't follow. If the page is thrown away between modifying the VM page and WAL-logging, then the pageserver would still contain the old non.-zero version of the VM page, and there would be no truncation in the pageserver.

We wallog all dirty VM/FSM pages when they are evicted. So what will happen in most cases on VM truncations:

  1. VM page is update on compute
  2. SMGR_TRUNCATE record is appended to the WAL
  3. VM pages is written and FPI is appended to the WAL

So in PS for this VM page we have sequence: SMGR_TRUNCATE,FPI after applying which we will get non-zero page. But if eviction happened between 1 and 2, then we will get:

  1. VM page is update on compute
  2. VM pages is written and FPI is appended to the WAL
  3. SMGR_TRUNCATE record is appended to the WAL

In this case the sequence will be FPI,SDMGR_TRUNCATE which produce zero VM page.

knizhnik avatar Oct 21 '24 18:10 knizhnik

I tried also "brute force" truing to reproduce the problem not deterministically, but more or less reliably, by running many concurrent tasks performing relation truncation:

import threading
from fixtures.neon_fixtures import NeonEnv


#
# Test that VM is properly truncated
#
def test_vm_truncate(neon_simple_env: NeonEnv):
    env = neon_simple_env

    n_iterations = 100
    n_threads = 16
    
    threads = []
    endpoint = env.endpoints.create_start("main")

    con = endpoint.connect()
    cur = con.cursor()
    cur.execute("CREATE EXTENSION neon_test_utils")

    def truncate(tid):
        con = endpoint.connect()
        cur = con.cursor()
        cur.execute("set enable_seqscan=off")
        cur.execute(f"create table t{tid}(pk integer primary key, counter integer default 0, filler text default repeat('?', 200))")
        for i in range(n_iterations):
            cur.execute(f"insert into t{tid} (pk) values (generate_series(1,1000))")
            cur.execute(f"delete from t{tid} where pk>1")
            cur.execute(f"vacuum t{tid}")
            cur.execute(f"update t{tid} set counter=counter+1")
            cur.execute("SELECT clear_buffer_cache()")
            cur.execute(f"select count(*) from t{tid}")
            cur.execute(f"delete from t{tid}")

    for i in range(n_threads):
        t = threading.Thread(target=truncate, args=(i,))
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

... but still without any success.

knizhnik avatar Oct 23 '24 07:10 knizhnik

I added test which just inspect content of VM page and checks that truncation is performed correctly.

knizhnik avatar Nov 01 '24 08:11 knizhnik