age icon indicating copy to clipboard operation
age copied to clipboard

Several server crashes when running tests in parallel

Open saygoodbyye opened this issue 1 year ago • 4 comments
trafficstars

Describe the bug Several server crashes when running tests in the way below.

How are you accessing AGE (Command line, driver, etc.)? Accessing AGE through command line.

What data setup do we need to do? Apache AGE (PG16 branch) with PostgreSQL (REL_16_STABLE).

What is the necessary configuration info needed? First build:

./configure CFLAGS=" -Og" --enable-tap-tests --enable-debug --enable-cassert

Second build:

./configure CFLAGS=" -Og" --enable-tap-tests --enable-debug

I was able to crash a server by doing the following: Makefile:

diff --git a/Makefile b/Makefile
index b405ff6..549e05b 100644
--- a/Makefile
+++ b/Makefile
@@ -85,31 +85,7 @@ SQLS := $(addsuffix .sql,$(SQLS))
 DATA_built = $(age_sql)
 
 # sorted in dependency order
-REGRESS = scan \
-          graphid \
-          agtype \
-          catalog \
-          cypher \
-          expr \
-          cypher_create \
-          cypher_match \
-          cypher_unwind \
-          cypher_set \
-          cypher_remove \
-          cypher_delete \
-          cypher_with \
-          cypher_vle \
-          cypher_union \
-          cypher_call \
-          cypher_merge \
-          age_global_graph \
-          age_load \
-          index \
-          analyze \
-          graph_generation \
-          name_validation \
-          jsonb_operators \
-          drop
+REGRESS=--schedule=schedule
 
 srcdir=`pwd`

schedule:

test: cypher_match cypher_match cypher_match cypher_match cypher_match cypher_match

Start tests:

for i in `seq 100000`;do echo "ITER $i";make -s installcheck;if coredumpctl;then break;fi; done

Results:

ITER 75
# +++ regress install-check in  +++
# using temp instance on port 61958 with PID 39227
# parallel group (6 tests):  cypher_match cypher_match cypher_match cypher_match cypher_match cypher_match
not ok 1     + cypher_match                             1081 ms
# (test process exited with exit code 2)
not ok 2     + cypher_match                             1035 ms
# (test process exited with exit code 2)
not ok 3     + cypher_match                             1061 ms
# (test process exited with exit code 2)
not ok 4     + cypher_match                             1034 ms
# (test process exited with exit code 2)
not ok 5     + cypher_match                             1036 ms
# (test process exited with exit code 2)
not ok 6     + cypher_match                             1035 ms
# (test process exited with exit code 2)
1..6
# 6 of 6 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/egor/work/subtree/age/regress/regression.diffs".
# A copy of the test summary that you see above is saved in the file "/home/egor/work/subtree/age/regress/regression.out".

There were many attempts to run tests in this way, so during the launch process I was able to get several different crashes, presented below: Backtrace of the crash on the build with --enable-cassert: bt1_cassert.txt Bracktrace of the first crash on the build without --enable-cassert: bt2.txt Bracktrace of the second crash on the build without --enable-cassert: bt3.txt

Expected behavior Expected ERROR to be shown or sql query to be succesfully executed

Best regards, Egor Chindyaskin Postgres Professional: http://postgrespro.com/

saygoodbyye avatar Jan 16 '24 05:01 saygoodbyye

@saygoodbyye We do not support user modified Makefiles. The Makefile is only for installing AGE and is only intended to test what was just installed.

If you want to run those specific checks in parallel to highlight an issue, please write a separate script to do so.

jrgemignani avatar Jan 16 '24 18:01 jrgemignani

It seems that the server crashes occur when running tests using a specific Makefile setup and iterating through the tests multiple times. Here's a summary of the issue and the steps leading to the server crashes:

Bug Description:

Several server crashes occur when running tests in a loop with the specified Makefile setup. Steps to Reproduce:

Modify the Makefile to run tests in a loop multiple times. Start tests using the modified Makefile. Iterate through the tests multiple times, possibly hundreds of iterations. Observe server crashes occurring during the test iterations. Expected Behavior:

Tests should execute successfully without causing server crashes even when run multiple times in a loop. Environment:

Apache AGE (PG16 branch) with PostgreSQL (REL_16_STABLE). Makefile Modification:

REGRESS=--schedule=schedule Modified Test Execution:

for i in seq 100000; do echo "ITER $i" make -s installcheck if coredumpctl; then break fi done Actual Outcome:

Server crashes occur during the test iterations, leading to failed tests and potential instability. Workaround:

Since the issue seems related to running tests repeatedly in a loop, you might consider running the tests individually or in smaller batches to mitigate the likelihood of server crashes until the root cause of the crashes can be identified and resolved. Additionally, analyzing the server logs and core dumps generated during the crashes could provide insights into the underlying cause.

diangamichael avatar Apr 05 '24 23:04 diangamichael

@jrgemignani, I was able to reproduce this bug in a different way using pgreplay utility. To reproduce bug follow steps:

*** configure and install pgreplay ***
*** then replay attached postmaster.log file ***
./pgreplay -j postmaster.log

postmaster.log

saygoodbyye avatar Apr 11 '24 12:04 saygoodbyye

This issue is stale because it has been open 60 days with no activity. Remove "Abondoned" label or comment or this will be closed in 14 days.

github-actions[bot] avatar Jun 11 '24 00:06 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove "Abondoned" label or comment or this will be closed in 14 days.

github-actions[bot] avatar Aug 11 '24 00:08 github-actions[bot]

This issue was closed because it has been stalled for further 14 days with no activity.

github-actions[bot] avatar Aug 26 '24 00:08 github-actions[bot]