pg_duckdb
pg_duckdb copied to clipboard
Weird stuff happens in the background worker sometimes
Description
There seems to be an issue with the background worker syncing. It seems to have something to do with the background worker trying to remove non-existent tables for which it knows oids. I guess that's because the duckdb.tables stores oids, and somehow these tables got removed without the entries being removed from duckdb.tables. How that happens I don't know.
Then that resulted in QueryCancelHoldoffCount being 0, when we did not expect so. I made a temporary workaround for this in #745. To be clear, I think this is related, but I'm definitely not convinced it's a single bug. It could very well be a first bug triggering a second bug.
Duck catalog for database 'my_db' in 'postgres': {'uuid': 2432bf03-f7f8-4067-afc6-8ef476b11e15, 'oid': 2353, 'version': 1}
2025-04-25 15:14:33.384 CEST [724922] WARNING: syntax error at or near "24577" at character 12
2025-04-25 15:14:33.384 CEST [724922] QUERY: DROP TABLE 24577
2025-04-25 15:14:33.384 CEST [724922] WARNING: Failed to drop deleted MotherDuck table 24577
2025-04-25 15:14:33.384 CEST [724922] DETAIL: While executing command: DROP TABLE 24577
2025-04-25 15:14:33.384 CEST [724922] HINT: See previous WARNING for details
2025-04-25 15:14:33.384 CEST [724922] WARNING: syntax error at or near "24580" at character 12
2025-04-25 15:14:33.384 CEST [724922] QUERY: DROP TABLE 24580
2025-04-25 15:14:33.384 CEST [724922] WARNING: Failed to drop deleted MotherDuck table 24580
2025-04-25 15:14:33.384 CEST [724922] DETAIL: While executing command: DROP TABLE 24580
2025-04-25 15:14:33.384 CEST [724922] HINT: See previous WARNING for details
TRAP: failed Assert("QueryCancelHoldoffCount > 0"), File: "src/pgduckdb_node.cpp", Line: 299, PID: 724922
postgres: pg_duckdb sync worker (ExceptionalCondition+0x6e)[0x55e6df61c962]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x40112)[0x7cccd03e1112]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x41be7)[0x7cccd03e2be7]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x40150)[0x7cccd03e1150]
postgres: pg_duckdb sync worker (ExecEndCustomScan+0x1a)[0x55e6df2f8e67]
postgres: pg_duckdb sync worker (ExecEndNode+0x167)[0x55e6df2e49b7]
postgres: pg_duckdb sync worker (+0x3022ce)[0x55e6df2de2ce]
postgres: pg_duckdb sync worker (standard_ExecutorEnd+0x66)[0x55e6df2de3a4]
postgres: pg_duckdb sync worker (ExecutorEnd+0x1d)[0x55e6df2de457]
postgres: pg_duckdb sync worker (PortalCleanup+0x64)[0x55e6df275146]
postgres: pg_duckdb sync worker (PortalDrop+0x3f)[0x55e6df652051]
postgres: pg_duckdb sync worker (SPI_cursor_close+0x17)[0x55e6df3233c2]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x229ed)[0x7cccd03c39ed]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x1f200)[0x7cccd03c0200]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x22cb6)[0x7cccd03c3cb6]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x1f36a)[0x7cccd03c036a]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(+0x1f5f5)[0x7cccd03c05f5]
/home/jelte/.pgenv/pgsql-17beta9/lib/pg_duckdb.so(pgduckdb_background_worker_main+0x18e)[0x7cccd03c0824]
postgres: pg_duckdb sync worker (BackgroundWorkerMain+0x291)[0x55e6df41b24e]
postgres: pg_duckdb sync worker (postmaster_child_launch+0xc7)[0x55e6df41d419]
postgres: pg_duckdb sync worker (+0x444c21)[0x55e6df420c21]
postgres: pg_duckdb sync worker (+0x444ed4)[0x55e6df420ed4]
postgres: pg_duckdb sync worker (+0x44507d)[0x55e6df42107d]
postgres: pg_duckdb sync worker (+0x445e45)[0x55e6df421e45]
postgres: pg_duckdb sync worker (BackgroundWorkerInitializeConnection+0x0)[0x55e6df4234ce]
postgres: pg_duckdb sync worker (main+0x219)[0x55e6df33e2ac]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7ccccf82a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7ccccf82a28b]
postgres: pg_duckdb sync worker (_start+0x25)[0x55e6df0b9ef5]
I found the cause for the oids being there. That was happening because the regclass queries were being sent through DuckDB. This problem is fixed by #770, but the cause for the QueryCancelHoldoffCount becoming 0 I still don't know. So leaving this issue open for now.