pg_duckdb icon indicating copy to clipboard operation
pg_duckdb copied to clipboard

Second call of pg_reload_conf() after calling any duckdb function crashes the server

Open ggnmstr opened this issue 2 months ago • 5 comments

What happens?

Second call of pg_reload_conf() after calling any duckdb function crashes the server.

To Reproduce

  1. Start a fresh PostgreSQL server with pg_duckdb in shared_preload_libraries
  2. Run this script:
create extension pg_duckdb ;
create table test(a int);
insert into test (a) values (5);

copy test to '/tmp/test.parquet';

select pg_reload_conf();
select pg_reload_conf();
  1. After second call of pg_reload_conf() server will crash:
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

Logs:


TRAP: failed Assert("IsTransactionState()"), File: "catcache.c", Line: 1378, PID: 360610
postgres: jorq postgres [local] idle(ExceptionalCondition+0x57) [0x9576f7]
postgres: jorq postgres [local] idle() [0x93fc15]
postgres: jorq postgres [local] idle(GetSysCacheOid+0x2e) [0x9534ae]
postgres: jorq postgres [local] idle(get_namespace_oid+0x27) [0x579fb7]
/usr/local/lib/postgresql/pg_duckdb.so(+0x28d02) [0x7f889b380d02]
/usr/local/lib/postgresql/pg_duckdb.so(+0x2925e) [0x7f889b38125e]
/usr/local/lib/postgresql/pg_duckdb.so(+0x2c9c4) [0x7f889b3849c4]
postgres: jorq postgres [local] idle(set_config_with_handle+0x1190) [0x9713d0]
postgres: jorq postgres [local] idle(ProcessConfigFileInternal+0x440) [0x972ca0]
postgres: jorq postgres [local] idle(ProcessConfigFile+0x62) [0x9766b2]
postgres: jorq postgres [local] idle(PostgresMain+0xfb6) [0x8203b6]
postgres: jorq postgres [local] idle(BackendMain+0x4d) [0x81b5ed]
postgres: jorq postgres [local] idle(postmaster_child_launch+0xb1) [0x780001]
postgres: jorq postgres [local] idle() [0x783d98]
postgres: jorq postgres [local] idle(PostmasterMain+0xd74) [0x7858b4]
postgres: jorq postgres [local] idle(main+0x1e8) [0x48e798]
/lib64/libc.so.6(+0x3575) [0x7f889b411575]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f889b411628]
postgres: jorq postgres [local] idle(_start+0x25) [0x48ea75]
2025-09-25 18:23:11.962 MSK [360591] LOG:  server process (PID 360610) was terminated by signal 6: Aborted
2025-09-25 18:23:11.962 MSK [360591] DETAIL:  Failed process was running: select pg_reload_conf();
2025-09-25 18:23:11.962 MSK [360591] LOG:  terminating any other active server processes
2025-09-25 18:23:11.962 MSK [360784] FATAL:  the database system is in recovery mode
2025-09-25 18:23:11.963 MSK [360591] LOG:  all server processes terminated; reinitializing
2025-09-25 18:23:11.977 MSK [360785] LOG:  database system was interrupted; last known up at 2025-09-25 18:22:15 MSK
2025-09-25 18:23:12.000 MSK [360785] LOG:  database system was not properly shut down; automatic recovery in progress
2025-09-25 18:23:12.001 MSK [360785] LOG:  redo starts at 0/154D1E8
2025-09-25 18:23:12.004 MSK [360785] LOG:  invalid record length at 0/167BEB0: expected at least 24, got 0
2025-09-25 18:23:12.004 MSK [360785] LOG:  redo done at 0/167BE78 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-09-25 18:23:12.006 MSK [360786] LOG:  checkpoint starting: end-of-recovery immediate wait
2025-09-25 18:23:12.103 MSK [360786] LOG:  checkpoint complete: wrote 230 buffers (1.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.085 s, total=0.098 s; sync files=115, longest=0.002 s, average=0.001 s; distance=1211 kB, estimate=1211 kB; lsn=0/167BEB0, redo lsn=0/167BEB0
2025-09-25 18:23:12.105 MSK [360591] LOG:  database system is ready to accept connections

OS:

Linux

pg_duckdb Version (if built from source use commit hash):

a8b1e105a0aa67a401abb1c1bffa92bb5a221411

Postgres Version (if built from source use commit hash):

53a2246ddf9b1e641654dd0654ff8f24e952920a

Hardware:

No response

Full Name:

Jora Babayan

Affiliation:

Postgres Professional

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • [x] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • [x] Yes, I have

ggnmstr avatar Sep 25 '25 15:09 ggnmstr

Okay, that's obviously not great. I think it could be related to #867

JelteF avatar Sep 26 '25 08:09 JelteF

Could you check if #936 solves it for you?

JelteF avatar Sep 26 '25 08:09 JelteF

Unfortunately, #936 did not help

ggnmstr avatar Sep 26 '25 11:09 ggnmstr

Found out some details that may help with understanding the problem.

So, if we run the same scenario, but provide the COPY statement in 2 lines, server will not crash. This is the example:

postgres=# copy test to '/tmp/test.parquet
postgres'# ';
COPY 1

After running this COPY statement, server will NOT crash due to multiple calls of pg_reload_conf()

This may be somehow related to DuckdbUtilityHook_Cpp, becase in this case code below is NOT executed:

static void
DuckdbUtilityHook_Cpp(PlannedStmt *pstmt, const char *query_string, bool read_only_tree, ProcessUtilityContext context,
                      ParamListInfo params, struct QueryEnvironment *query_env, DestReceiver *dest,
                      QueryCompletion *qc) {
	Node *parsetree = pstmt->utilityStmt;
	if (IsA(parsetree, CopyStmt)) {
		auto copy_query = PostgresFunctionGuard(MakeDuckdbCopyQuery, pstmt, query_string, query_env);
		if (copy_query) { 
            // This branch is not executed if we provide copy statement like above!
			auto res = pgduckdb::DuckDBQueryOrThrow(copy_query);
			auto chunk = res->Fetch();
			auto processed = chunk->GetValue(0, 0).GetValue<uint64_t>();
			if (qc) {
				SetQueryCompletion(qc, CMDTAG_COPY, processed);
			}
			return;
		}
	}

ggnmstr avatar Oct 02 '25 10:10 ggnmstr

I can reproduce the issue. The backtrace:

Image
postgres=# copy test to '/tmp/test.parquet
postgres'# ';
COPY 1

Appending a new line to the filename will trick the server to use Postgres to do the copy instead, i.e., csv format. So no duckdb connection is initialized.

YuweiXiao avatar Oct 09 '25 03:10 YuweiXiao

Found out some new details. The problem code is located in DuckAssignTimezone function. That's an assign_hook for PostgreSQL's vanila GUC option timezone set by pg_duckdb.

static void
DuckAssignTimezone_Cpp(const char *tz) {
// some code ....
// NOTE: COMMENTING NEXT 2 STATEMENTS FIXES THE PROBLEM!!!
  auto connection = pgduckdb::DuckDBManager::GetConnection(false);
  pgduckdb::DuckDBQueryOrThrow(*connection, "SET TimeZone =" + duckdb::KeywordHelper::WriteQuoted(tz));
 // some code ....
}

Commenting out provided 2 statements solves the problem.

backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f041c481f63 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:89
#2  0x00007f041c427f3e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f041c40f6d0 in __GI_abort () at abort.c:77
#4  0x0000000000957f18 in ExceptionalCondition (conditionName=conditionName@entry=0xa63029 "IsTransactionState()",
    fileName=fileName@entry="postgres/src/backend/utils/error/assert.c", lineNumber=lineNumber@entry=1378) at assert.c:66
#5  0x0000000000940415 in SearchCatCacheInternal (cache=0x51cb880, nkeys=1, v1=139655630284902, v1@entry=35, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0)
    at postgres/src/backend/utils/cache/catcache.c:1378
#6  0x0000000000940e54 in SearchCatCache (cache=<optimized out>, v1=v1@entry=35, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0)
    at postgres/src/backend/utils/cache/catcache.c:1318
#7  0x0000000000953cae in SearchSysCache (cacheId=cacheId@entry=35, key1=key1@entry=35, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0)
    at postgres/src/backend/utils/cache/syscache.c:217
#8  GetSysCacheOid (cacheId=cacheId@entry=35, oidcol=oidcol@entry=1, key1=key1@entry=139655630284902, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0)
    at postgres/src/backend/utils/cache/syscache.c:460
#9  0x000000000057a277 in get_namespace_oid (nspname=0x7f041c3be866 "duckdb", missing_ok=false)
    at postgres/src/backend/catalog/namespace.c:3542
#10 0x00007f041c3119f6 in pgduckdb::GetSeqLastValue (seq_name=0x7f041c3be969 "extensions_table_seq")
    at src/pgduckdb_duckdb.cpp:226
#11 0x00007f041c3126cc in pgduckdb::DuckDBManager::RefreshConnectionState (this=0x7f041c40d9e0 <pgduckdb::DuckDBManager::manager_instance>, context=...)
    at src/pgduckdb_duckdb.cpp:316
#12 0x00007f041c3129b0 in pgduckdb::DuckDBManager::GetConnection (force_transaction=false)
    at src/pgduckdb_duckdb.cpp:374
#13 0x00007f041c31b75b in pgduckdb::DuckAssignTimezone_Cpp (tz=0x51408f8 "Europe/Moscow")
    at src/pgduckdb_guc.cpp:276
#14 0x00007f041c31bb52 in pgduckdb::__CPPFunctionGuard__<void (*)(char const*), pgduckdb::DuckAssignTimezone_Cpp, char const*> (
    func_name=0x7f041c3c01f4 "DuckAssignTimezone_Cpp", file_name=0x7f041c3c020b "src/pgduckdb_guc.cpp", line=287)
    at src/pgduckdb_guc.cpp:287
#15 0x00007f041c31b929 in pgduckdb::DuckAssignTimezone (newval=0x514cf38 "Europe/Moscow", extra=0x514c960)
    at src/pgduckdb_guc.cpp:287
#16 0x0000000000971c10 in set_config_with_handle (name=<optimized out>, handle=handle@entry=0x0, value=0x518f088 "Europe/Moscow",
    context=context@entry=PGC_SIGHUP, source=source@entry=PGC_S_FILE, srole=srole@entry=10, action=GUC_ACTION_SET, changeVal=<optimized out>, elevel=12,
    is_reload=false) at postgres/src/backend/utils/misc/guc.c:4086
#17 0x00000000009734e0 in set_config_option (name=<optimized out>, value=<optimized out>, context=PGC_SIGHUP, source=PGC_S_FILE, action=GUC_ACTION_SET,
    changeVal=<optimized out>, elevel=0, is_reload=false) at postgres/src/backend/utils/misc/guc.c:3363
#18 ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, applySettings=applySettings@entry=true, elevel=elevel@entry=13)
    at postgres/src/backend/utils/misc/guc.c:557
#19 0x0000000000976ef2 in ProcessConfigFile (context=context@entry=PGC_SIGHUP)
    at postgres/src/backend/utils/misc/guc-file.l:152

Crash on:

pwndbg> fr 5
#5  0x0000000000940415 in SearchCatCacheInternal (cache=0x51cb880, nkeys=1, v1=139655630284902, v1@entry=35, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0)
    at catcache.c:1378
1378            Assert(IsTransactionState());
pwndbg> p CurrentTransactionState->state
$2 = TRANS_DEFAULT

So, when reloading configuration, pg_duckdb establishes connection to DuckDB. I suppose this leads to some state corruption, which results in crash on the second call of pg_reload_conf()

ggnmstr avatar Dec 01 '25 20:12 ggnmstr

Thanks all for the help debugging this. I found the cause and fixed it in https://github.com/duckdb/pg_duckdb/pull/981

JelteF avatar Dec 02 '25 12:12 JelteF