pg_duckdb icon indicating copy to clipboard operation
pg_duckdb copied to clipboard

Server crashes if parallel run "scan_postgres_tables" test

Open saygoodbyye opened this issue 8 months ago • 5 comments

What happens?

If we parallel run "scan_postgres_tables" test like below, server will crash PostgreSQL build:

CPPFLAGS="-Og -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=nonnull-attribute -fstack-protector" \
LDFLAGS='-fsanitize=address -fsanitize=undefined -static-libasan' \
./configure --enable-crash-info --enable-tap-tests --with-openssl --enable-debug --enable-cassert --with-icu --with-lz4 --with-libxml
export ASAN_OPTIONS=detect_stack_use_after_return=0:detect_leaks=0:abort_on_error=1:disable_coredump=0:strict_string_checks=1:check_initialization_order=1:strict_init_order=1:detect_odr_violation=0

To Reproduce

Patch test/regression/schedule:

test:   scan_postgres_tables  scan_postgres_tables  scan_postgres_tables  scan_postgres_tables   scan_postgres_tables  scan_postgres_tables  scan_postgres_tables  scan_postgres_tables   scan_postgres_tables  scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables

Then execute

make installcheck

regression.out:

# parallel group (15 tests):  scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables scan_postgres_tables
not ok 1     + scan_postgres_tables                    22236 ms
# (test process exited with exit code 2)
not ok 2     + scan_postgres_tables                    22045 ms
# (test process exited with exit code 2)
not ok 3     + scan_postgres_tables                    22260 ms
# (test process exited with exit code 2)
not ok 4     + scan_postgres_tables                    22274 ms
# (test process exited with exit code 2)
not ok 5     + scan_postgres_tables                    22153 ms
# (test process exited with exit code 2)
not ok 6     + scan_postgres_tables                    22256 ms
# (test process exited with exit code 2)
not ok 7     + scan_postgres_tables                    22215 ms
# (test process exited with exit code 2)
not ok 8     + scan_postgres_tables                    22263 ms
# (test process exited with exit code 2)
not ok 9     + scan_postgres_tables                    22115 ms
# (test process exited with exit code 2)
not ok 10    + scan_postgres_tables                    25378 ms
# (test process exited with exit code 2)
not ok 11    + scan_postgres_tables                    22264 ms
# (test process exited with exit code 2)
not ok 12    + scan_postgres_tables                    22265 ms
# (test process exited with exit code 2)
not ok 13    + scan_postgres_tables                    22258 ms
# (test process exited with exit code 2)
not ok 14    + scan_postgres_tables                    22240 ms
# (test process exited with exit code 2)
not ok 15    + scan_postgres_tables                    22246 ms
# (test process exited with exit code 2)
1..15
# 15 of 15 tests failed.

backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f77b41c9f4f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f77b417afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f77b4165472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000558b2551c51f in __sanitizer::Abort() ()
#5  0x0000558b25528bb1 in __sanitizer::Die() ()
#6  0x0000558b25507f6e in __asan::ScopedInErrorReport::~ScopedInErrorReport() ()
#7  0x0000558b255074d6 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) ()
#8  0x0000558b255085bc in __asan_report_load8 ()
#9  0x00007f77af5013c9 in pgduckdb::PostgresTableReader::GetNextTuple (this=this@entry=0x607000090b70) at src/scan/postgres_table_reader.cpp:271
#10 0x00007f77af4e4ce0 in pgduckdb::PostgresScanTableFunction::PostgresScanFunction (data=..., output=...) at src/scan/postgres_scan.cpp:261
#11 0x00007f77ad47903c in duckdb::PhysicalTableScan::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSourceInput&) const () from /tmp/pgsql/lib/libduckdb.so
#12 0x00007f77ad5fb9ab in duckdb::PipelineExecutor::FetchFromSource(duckdb::DataChunk&) () from /tmp/pgsql/lib/libduckdb.so
#13 0x00007f77ad605de7 in duckdb::PipelineExecutor::Execute(unsigned long) () from /tmp/pgsql/lib/libduckdb.so
#14 0x00007f77ad60611f in duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) () from /tmp/pgsql/lib/libduckdb.so
#15 0x00007f77ad5fd1a1 in duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) () from /tmp/pgsql/lib/libduckdb.so
#16 0x00007f77ad604f52 in duckdb::TaskScheduler::ExecuteForever(std::atomic<bool>*) () from /tmp/pgsql/lib/libduckdb.so
#17 0x00007f77b30d44a3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007f77b41c81f5 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#19 0x00007f77b424889c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

OS:

Debian 12 x86_64

pg_duckdb Version (if built from source use commit hash):

e99ab5d50e59717a73d6a29dbc364e26ebfb83ea

Postgres Version (if built from source use commit hash):

b19893b94bdea3b206cb544619d84cea6276f648

Hardware:

No response

Full Name:

Egor Chindyaskin

Affiliation:

Postgres Professional

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • [ ] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • [ ] Yes, I have

saygoodbyye avatar Apr 04 '25 07:04 saygoodbyye

I tried reproducing this issue myself, but I was unable to. Does this still reproduce for you?

JelteF avatar Apr 24 '25 09:04 JelteF

@JelteF, i am still able to reproduce this crash

tried to execute make installcheck in loop

for i in `seq 10000`;do make installcheck;if coredumpctl;then break;fi;done
#0  0x00007fc1bf53fdd7 in _dl_fixup (l=0x7fc1b9fb6460, reloc_arg=1144) at ./elf/dl-runtime.c:48
#1  0x00007fc1bf5422ba in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:130
#2  0x0000557c4a40e894 in CopyErrorData () at elog.c:1762
#3  0x00007fc1b9f1f228 in pgduckdb::__PostgresFunctionGuard__<TupleTableSlot* (*)(PlanState*), ExecProcNode, PlanState*> (
    func_name=func_name@entry=0x7fc1b9fb6d80 "ExecProcNode") at /usr/include/c++/12/bits/new_allocator.h:80
#4  0x00007fc1b9f20d39 in pgduckdb::PostgresTableReader::GetNextTuple (this=this@entry=0x6070000cb420) at src/scan/postgres_table_reader.cpp:272
#5  0x00007fc1b9f04566 in pgduckdb::PostgresScanTableFunction::PostgresScanFunction (data=..., output=...) at src/scan/postgres_scan.cpp:261
#6  0x00007fc1b7e73d2a in duckdb::PhysicalTableScan::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSourceInput&) const ()
   from /tmp/pgsql/lib/libduckdb.so
#7  0x00007fc1b7ff731b in duckdb::PipelineExecutor::FetchFromSource(duckdb::DataChunk&) () from /tmp/pgsql/lib/libduckdb.so
#8  0x00007fc1b8001657 in duckdb::PipelineExecutor::Execute(unsigned long) () from /tmp/pgsql/lib/libduckdb.so
#9  0x00007fc1b800198f in duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) () from /tmp/pgsql/lib/libduckdb.so
#10 0x00007fc1b7ff8a09 in duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) () from /tmp/pgsql/lib/libduckdb.so
#11 0x00007fc1b80007c2 in duckdb::TaskScheduler::ExecuteForever(std::atomic<bool>*) () from /tmp/pgsql/lib/libduckdb.so
#12 0x00007fc1bdad44a3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007fc1bdca81f5 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#14 0x00007fc1bdd2889c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
not ok 1     + scan_postgres_tables                    43534 ms
# (test process exited with exit code 2)
not ok 2     + scan_postgres_tables                    43356 ms
# (test process exited with exit code 2)
not ok 3     + scan_postgres_tables                    43494 ms
# (test process exited with exit code 2)
not ok 4     + scan_postgres_tables                    43268 ms
# (test process exited with exit code 2)
not ok 5     + scan_postgres_tables                    43540 ms
# (test process exited with exit code 2)
not ok 6     + scan_postgres_tables                    43469 ms
# (test process exited with exit code 2)
not ok 7     + scan_postgres_tables                    43469 ms
# (test process exited with exit code 2)
not ok 8     + scan_postgres_tables                    46470 ms
# (test process exited with exit code 2)
not ok 9     + scan_postgres_tables                    43533 ms
# (test process exited with exit code 2)
not ok 10    + scan_postgres_tables                    43529 ms
# (test process exited with exit code 2)
not ok 11    + scan_postgres_tables                    43532 ms
# (test process exited with exit code 2)
not ok 12    + scan_postgres_tables                    43513 ms
# (test process exited with exit code 2)
not ok 13    + scan_postgres_tables                    43537 ms
# (test process exited with exit code 2)
not ok 14    + scan_postgres_tables                    43522 ms
# (test process exited with exit code 2)
not ok 15    + scan_postgres_tables                    43451 ms
# (test process exited with exit code 2)
1..15
# 15 of 15 tests failed.

saygoodbyye avatar Apr 24 '25 11:04 saygoodbyye

Not reproducible on my side either. By the way, how do you run scan_postgres_tables in parallel within a single installcheck command? The tests might conflict due to the same table name and setting the same GUC.

YuweiXiao avatar Apr 30 '25 08:04 YuweiXiao

Hi @saygoodbyye I'm looking into this one. I've tried to compile PG with the same options you've provided but got:

configure: WARNING: unrecognized options: --enable-crash-info

I've checked the PG source code and I don't see any reference for this - am I missing something?

Thanks!

Y-- avatar May 19 '25 14:05 Y--

@Y--, This option is unnecessary, please do not pay attention to it

saygoodbyye avatar May 19 '25 14:05 saygoodbyye

I have good hope that this has been fixed by #877. So closing this. If you can still reproduce, please reopen (or open a new issue).

JelteF avatar Sep 03 '25 22:09 JelteF