neon
neon copied to clipboard
test_change_pageserver is unstable due to async reload signal handling
Multiple failures of test_change_pageserver, e. g.: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10993/13587168793/index.html#/testresult/940f78b1b81ded4a test_change_pageserver[release-pg16] / X64 / __sanitizers: 'disabled'
https://neon-github-public-dev.s3.amazonaws.com/reports/main/13316989014/index.html#/testresult/e03d8f42343a4def test_change_pageserver[release-pg16] / ARM64 / __sanitizers: 'disabled'
https://neon-github-public-dev.s3.amazonaws.com/reports/main/13720657425/index.html#/testresult/328dcbe2aa327d95 test_change_pageserver[release-pg14] / ARM64 / __sanitizers: 'disabled'
https://neon-github-public-dev.s3.amazonaws.com/reports/main/13824996212/index.html#/testresult/5611bc191d85599 test_change_pageserver[release-pg17] / ARM64 / __sanitizers: 'enabled'
with the following diagnostics:
test_runner/regress/test_change_pageserver.py:91: in test_change_pageserver
connstring = fetchone()
test_runner/regress/test_change_pageserver.py:56: in fetchone
assert all(result == results[0] for result in results)
E assert False
E + where False = all(<generator object test_change_pageserver.<locals>.fetchone.<locals>.<genexpr> at 0xffd43561dc40>)
with the corresponding test fragment:
def fetchone():
results = [cur.fetchone() for cur in curs]
assert all(result == results[0] for result in results)
return results[0]
...
endpoint.reconfigure(pageserver_id=alt_pageserver_id)
# Verify that the neon.pageserver_connstring GUC is set to the correct thing
execute("SELECT setting FROM pg_settings WHERE name='neon.pageserver_connstring'")
connstring = fetchone()
indicate that the test might fail because of asynchronous "reconfigure" processing.
With the patches pg_settings-async-debug.patch.txt, test_change_pageserver.patch.txt applied, the test fails on each run for me, with such messages in test.log:
def fetchone():
results = []
for cur in curs:
res = cur.fetchone()
log.info(f"!!!res: {res}")
results.append(res)
> assert all(result == results[0] for result in results)
E assert False
E + where False = all(<generator object test_change_pageserver.<locals>.fetchone.<locals>.<genexpr> at 0x7005fc7fbbc0>)
...
2025-03-16 14:32:33.239 INFO [test_change_pageserver.py:58] !!!res: ('postgresql://no_user@localhost:15005',)
2025-03-16 14:32:33.239 INFO [test_change_pageserver.py:58] !!!res: ('postgresql://no_user@localhost:15007',)
2025-03-16 14:32:33.239 INFO [test_change_pageserver.py:58] !!!res: ('postgresql://no_user@localhost:15007',)
---------------------------- Captured log teardown -----------------------------
2025-03-16 14:32:33.382 INFO [neon_fixtures.py:942] Cleaning up all storage and compute nodes
Note that the first connection string differs from others.
This failure is still happening: https://neon-github-public-dev.s3.amazonaws.com/reports/main/15289357837/index.html#/testresult/33be51a12a39d822 5/28/2025 4:49:14 – 4:49:31
test_runner/regress/test_change_pageserver.py:79: in test_change_pageserver
connstring = fetchone()
test_runner/regress/test_change_pageserver.py:44: in fetchone
assert all(result == results[0] for result in results)
E assert False
E + where False = all(<generator object test_change_pageserver.<locals>.fetchone.<locals>.<genexpr> at 0xff2786acbbc0>)
https://neon-github-public-dev.s3.amazonaws.com/reports/main/15406417966/index.html#/testresult/7f9cec989573c082 6/3/2025 4:52:00 – 4:52:17
test_runner/regress/test_change_pageserver.py:79: in test_change_pageserver
connstring = fetchone()
test_runner/regress/test_change_pageserver.py:44: in fetchone
assert all(result == results[0] for result in results)
E assert False
E + where False = all(<generator object test_change_pageserver.<locals>.fetchone.<locals>.<genexpr> at 0xffeef18c9e00>)
This issue was moved to Jira: LKB-1778