citus icon indicating copy to clipboard operation
citus copied to clipboard

Query with window function, hstore condition and outer join causes segmentation fault

Open xrossdata opened this issue 1 month ago • 7 comments

Running the following SQL query always causes SEGFAULT in Citus 13.1.0/13.1.1 with PG17.5. Tthis issue seems similar to #7705, but SEGFAULT with both of HSTORE condition and OUTER JOIN.

Testing environment:

  • Rocky Linux 9.5
  • Postgres 17.5 (src)
  • citus 13.1.0, 13.1.1
  • 1 coordinator, 3 workers

Reproduce queries:

CREATE SCHEMA segfault_reproduce;

SET search_path TO segfault_reproduce,"$user",public;

DROP TABLE IF EXISTS segfault_reproduce.users CASCADE;
CREATE TABLE segfault_reproduce.users (
    tenant_id VARCHAR(10) NOT NULL,
    session_id TEXT NOT NULL,
    customer_id TEXT,
    personal HSTORE
);

SELECT create_distributed_table('users', 'session_id');
SELECT truncate_local_data_after_distributing_table('users');


DROP TABLE IF EXISTS segfault_reproduce.user_sessions CASCADE;
CREATE TABLE segfault_reproduce.user_sessions (
    tenant_id VARCHAR(10) NOT NULL,
    ts TIMESTAMP NOT NULL,
    session_id TEXT NOT NULL
);

SELECT create_distributed_table('user_sessions', 'session_id');
SELECT truncate_local_data_after_distributing_table('user_sessions');


--
-- With the window function and no hstore condition and a subquery including inner join
-- OK
--
SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
INNER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)) search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;


--
-- With the window function and no hstore condition and a subquery including outer join
-- OK
--
SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
LEFT OUTER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)) search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;


--
-- With the window function and a hstore condition and a subquery including inner join
-- OK
--
SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
INNER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)
WHERE users.PERSONAL->'foo'='bar') search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;


--
-- With no window function and a hstore condition and a subquery including outer join
-- OK
--
SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
LEFT OUTER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)
WHERE users.PERSONAL->'foo'='bar') search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;


--
-- With the window function and a hstore condition and a subquery including outer join
-- Segmentation Fault
--
SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
LEFT OUTER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)
WHERE users.PERSONAL->'foo'='bar') search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;


DROP SCHEMA IF EXISTS segfault_reproduce CASCADE;

Logs:

2025-11-07 14:10:09 JST [633279]: [181-1] user=,db=,app=,client= LOG:  server process (PID 571678) was terminated by signal 11: Segmentation fault
2025-11-07 14:10:09 JST [633279]: [182-1] user=,db=,app=,client= DETAIL:  Failed process was running: SELECT
            userlist.tenant_id,
            COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
        FROM
            users userlist
        INNER JOIN
            (
        SELECT
            user_sessions.TENANT_ID,
            user_sessions.SESSION_ID
        FROM
            user_sessions
        LEFT OUTER JOIN
            users
        ON (user_sessions.TENANT_ID=users.TENANT_ID
        AND user_sessions.SESSION_ID=users.SESSION_ID)
        WHERE users.PERSONAL->'foo'='bar') search_subquery
        ON (userlist.tenant_id=search_subquery.tenant_id
        AND search_subquery.session_id=userlist.session_id)
        WHERE userlist.tenant_id='1234567890'
        GROUP BY
            userlist.tenant_id,
            userlist.customer_id;
2025-11-07 14:10:09 JST [633279]: [183-1] user=,db=,app=,client= LOG:  terminating any other active server processes
2025-11-07 14:10:09 JST [572369]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local] LOG:  connection received: host=[local]
2025-11-07 14:10:09 JST [572369]: [2-1] user=xdata,db=xdata_dev_db,app=[unknown],client=[local] FATAL:  the database system is in recovery mode
2025-11-07 14:10:09 JST [633279]: [184-1] user=,db=,app=,client= LOG:  all server processes terminated; reinitializing

xrossdata avatar Nov 07 '25 05:11 xrossdata

Also explain command.

Logs:

2025-11-07 15:29:39 JST [608480]: [21-1] user=,db=,app=,client= LOG:  server process (PID 623865) was terminated by signal 11: Segmentation fault
2025-11-07 15:29:39 JST [608480]: [22-1] user=,db=,app=,client= DETAIL:  Failed process was running: explain SELECT
            userlist.tenant_id,
            COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
        FROM
            users userlist
        INNER JOIN
            (
        SELECT
            user_sessions.TENANT_ID,
            user_sessions.SESSION_ID
        FROM
            user_sessions
        LEFT OUTER JOIN
            users
        ON (user_sessions.TENANT_ID=users.TENANT_ID
        AND user_sessions.SESSION_ID=users.SESSION_ID)
        WHERE users.PERSONAL->'foo'='bar') search_subquery
        ON (userlist.tenant_id=search_subquery.tenant_id
        AND search_subquery.session_id=userlist.session_id)
        WHERE userlist.tenant_id='1234567890'
        GROUP BY
            userlist.tenant_id,
            userlist.customer_id;
2025-11-07 15:29:39 JST [608480]: [23-1] user=,db=,app=,client= LOG:  terminating any other active server processes
2025-11-07 15:29:39 JST [623913]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local] LOG:  connection received: host=[local]
2025-11-07 15:29:39 JST [623913]: [2-1] user=xdata,db=xdata_dev_db,app=[unknown],client=[local] FATAL:  the database system is in recovery mode
2025-11-07 15:29:39 JST [608480]: [24-1] user=,db=,app=,client= LOG:  all server processes terminated; reinitializing

xrossdata avatar Nov 07 '25 06:11 xrossdata

Reproduced on 13.2.0 as well.

xrossdata avatar Nov 07 '25 08:11 xrossdata

@xrossdata thanks for reporting. This looks like an execution-time problem, as the query runs ok on empty tables (below). Can you provide some INSERTs that help in reproducing ? We can try with synthetic data ourselves, but if you have some DML you can share it would be appreciated.

SELECT
    userlist.tenant_id,
    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS "all_count"
FROM
    users userlist
INNER JOIN
    (
SELECT
    user_sessions.TENANT_ID,
    user_sessions.SESSION_ID
FROM
    user_sessions
LEFT OUTER JOIN
    users
ON (user_sessions.TENANT_ID=users.TENANT_ID
AND user_sessions.SESSION_ID=users.SESSION_ID)
WHERE users.PERSONAL->'foo'='bar') search_subquery
ON (userlist.tenant_id=search_subquery.tenant_id
AND search_subquery.session_id=userlist.session_id)
WHERE userlist.tenant_id='1234567890'
GROUP BY
    userlist.tenant_id,
    userlist.customer_id;
┌───────────┬───────────┐
│ tenant_id │ all_count │
├───────────┼───────────┤
└───────────┴───────────┘

colm-mchugh avatar Nov 07 '25 13:11 colm-mchugh

Thanks for reply.

In our environment, SEGFAULT with or without data, so we don't believe it occurs with any particular data or combination of data.

I built PG17.5 in debug mode, re-build citus-13.2.0, and got the stack trace .

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
GroupTargetEntryList (groupClauseList=0x24ea850, targetEntryList=0x22e33a0) at planner/multi_logical_optimizer.c:4389
4389                    TargetEntry *groupTargetEntry =
(gdb) bt
#0  GroupTargetEntryList (groupClauseList=0x24ea850, targetEntryList=0x22e33a0) at planner/multi_logical_optimizer.c:4389
#1  0x00007f938bd44413 in WindowPartitionOnDistributionColumn (query=<optimized out>) at planner/query_pushdown_planning.c:490
#2  SafeToPushdownWindowFunction (query=query@entry=0x23cf5e8, errorDetail=errorDetail@entry=0x0) at planner/query_pushdown_planning.c:452
#3  0x00007f938bd3855a in MultiExtendedOpNode (queryTree=queryTree@entry=0x23cf5e8, originalQuery=originalQuery@entry=0x23cf5e8) at planner/multi_logical_planner.c:1766
#4  0x00007f938bd389c9 in MultiNodeTree (queryTree=queryTree@entry=0x23cf5e8) at planner/multi_logical_planner.c:713
#5  0x00007f938bd38d98 in MultiLogicalPlanCreate (originalQuery=originalQuery@entry=0x24ebf18, queryTree=queryTree@entry=0x23cf5e8, plannerRestrictionContext=plannerRestrictionContext@entry=0x22e0888) at planner/multi_logical_planner.c:161
#6  0x00007f938bd25e3e in CreateDistributedPlan (planId=planId@entry=5, allowRecursivePlanning=allowRecursivePlanning@entry=true, originalQuery=0x24ebf18, query=0x23cf5e8, boundParams=0x0, hasUnresolvedParams=hasUnresolvedParams@entry=false, plannerRestrictionContext=0x22e0888)
    at planner/distributed_planner.c:1185
#7  0x00007f938bd265c3 in CreateDistributedPlannedStmt (planContext=planContext@entry=0x7ffdf6a06f70) at planner/distributed_planner.c:737
#8  0x00007f938bd26ca0 in PlanDistributedStmt (rteIdCounter=<optimized out>, planContext=0x7ffdf6a06f70) at planner/distributed_planner.c:672
#9  distributed_planner (parse=0x23cf5e8,
    query_string=0x22273d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"..., cursorOptions=<optimized out>,
    boundParams=<optimized out>) at planner/distributed_planner.c:273
#10 0x00007f938eb1e55c in pgss_planner () from /usr/local/pgsql17.5-2/lib/pg_stat_statements.so
#11 0x000000000082cdab in pg_plan_query (querytree=0x23cf5e8,
    query_string=query_string@entry=0x22273d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:908
#12 0x000000000082ce91 in pg_plan_queries (querytrees=0x22ddb30,
    query_string=query_string@entry=0x22273d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:1000
#13 0x000000000082d146 in exec_simple_query (
    query_string=0x22273d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...) at postgres.c:1197
#14 0x000000000082ed6a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4767
#15 0x0000000000829daf in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:105
#16 0x00000000007a4aaa in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7ffdf6a078bc "", startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffdf6a078c0) at launch_backend.c:277
#17 0x00000000007a7fe1 in BackendStartup (client_sock=0x7ffdf6a078c0) at postmaster.c:3594
#18 ServerLoop () at postmaster.c:1676
#19 0x00000000007a9bea in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x21e2740) at postmaster.c:1374
#20 0x000000000050d77b in main (argc=3, argv=0x21e2740) at main.c:199

Here are some configure options and other details:

  • PG configure options : --prefix=/usr/local/pgsql17.5-2 --with-openssl --with-perl --with-llvm --with-systemd --enable-debug
  • gcc-11.5.0-5.el9_5.x86_64
  • openssl-libs-3.2.2-6.el9_5.1.x86_64
  • llvm-18.1.8-3.el9.x86_64
  • citus settings in postgresql.conf
shared_preload_libraries = 'citus,pg_stat_statements'
wal_level = logical

xrossdata avatar Nov 10 '25 04:11 xrossdata

PG17.5(debug) with citus-13.1.1.

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
GroupTargetEntryList (groupClauseList=0x19ab650, targetEntryList=0x17f33a0) at planner/multi_logical_optimizer.c:4389
4389                    TargetEntry *groupTargetEntry =
(gdb) bt
#0  GroupTargetEntryList (groupClauseList=0x19ab650, targetEntryList=0x17f33a0) at planner/multi_logical_optimizer.c:4389
#1  0x00007f480d349373 in WindowPartitionOnDistributionColumn (query=<optimized out>) at planner/query_pushdown_planning.c:489
#2  SafeToPushdownWindowFunction (query=query@entry=0x186cff8, errorDetail=errorDetail@entry=0x0) at planner/query_pushdown_planning.c:451
#3  0x00007f480d33d9aa in MultiExtendedOpNode (queryTree=queryTree@entry=0x186cff8, originalQuery=originalQuery@entry=0x186cff8) at planner/multi_logical_planner.c:1762
#4  0x00007f480d33de19 in MultiNodeTree (queryTree=queryTree@entry=0x186cff8) at planner/multi_logical_planner.c:709
#5  0x00007f480d33e1e8 in MultiLogicalPlanCreate (originalQuery=originalQuery@entry=0x19acd18, queryTree=queryTree@entry=0x186cff8, plannerRestrictionContext=plannerRestrictionContext@entry=0x17f0888) at planner/multi_logical_planner.c:157
#6  0x00007f480d32bbde in CreateDistributedPlan (planId=planId@entry=5, allowRecursivePlanning=allowRecursivePlanning@entry=true, originalQuery=0x19acd18, query=0x186cff8, boundParams=0x0, hasUnresolvedParams=hasUnresolvedParams@entry=false, plannerRestrictionContext=0x17f0888)
    at planner/distributed_planner.c:1204
#7  0x00007f480d32c353 in CreateDistributedPlannedStmt (planContext=planContext@entry=0x7fff7d104b10) at planner/distributed_planner.c:759
#8  0x00007f480d32ca28 in PlanDistributedStmt (rteIdCounter=<optimized out>, planContext=0x7fff7d104b10) at planner/distributed_planner.c:694
#9  distributed_planner (parse=0x186cff8,
    query_string=0x17373d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"..., cursorOptions=<optimized out>,
    boundParams=<optimized out>) at planner/distributed_planner.c:286
#10 0x00007f481072a55c in pgss_planner () from /usr/local/pgsql17.5-2/lib/pg_stat_statements.so
#11 0x000000000082cdab in pg_plan_query (querytree=0x186cff8,
    query_string=query_string@entry=0x17373d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:908
#12 0x000000000082ce91 in pg_plan_queries (querytrees=0x17edb30,
    query_string=query_string@entry=0x17373d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:1000
#13 0x000000000082d146 in exec_simple_query (
    query_string=0x17373d8 "SELECT\n    userlist.tenant_id,\n    COUNT(userlist.customer_id) OVER (PARTITION BY userlist.tenant_id) AS \"all_count\"\nFROM\n    users userlist\nINNER JOIN\n    (\nSELECT\n    user_sessions.TENANT_ID,\n    us"...) at postgres.c:1197
#14 0x000000000082ed6a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4767
#15 0x0000000000829daf in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:105
#16 0x00000000007a4aaa in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7fff7d10545c "", startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7fff7d105460) at launch_backend.c:277
#17 0x00000000007a7fe1 in BackendStartup (client_sock=0x7fff7d105460) at postmaster.c:3594
#18 ServerLoop () at postmaster.c:1676
#19 0x00000000007a9bea in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x16f2740) at postmaster.c:1374
#20 0x000000000050d77b in main (argc=3, argv=0x16f2740) at main.c:199
(gdb)

xrossdata avatar Nov 10 '25 05:11 xrossdata

Thanks @xrossdata we were able to reproduce the issue with this information. The problem is caused by Citus not handling an optimization that Postgres 17 applies to the window clause of the query; its partition clause can be changed from a SortGroupClause to a Path node by the Postgres planner, and Citus needs to handle this when its trying to determine if the window partition is on a distribution column. It currently assumes all partition clauses are sort group references. The issue does not happen when LEFT JOIN is changed to INNER JOIN because Postgres does not alter the window's partition clause. So something for us to investigate further and fix.

colm-mchugh avatar Nov 10 '25 12:11 colm-mchugh

It looks like the problem may be caused by a dangling pointer; debugging the Postgres planner shows that optimization c651020 is applied to the windows clause, so that it has a zero-length (or NIL) partitionClause. However, after stepping out of function set_plan_references() , the paritionClause appears to be corrupted:

-exec p * (List* ) node->partitionClause
$3 = {type = T_List, length = 16724, max_length = 6, elements = 0x415b00000005, initial_elements = 0x55ab26d69b70}

Its not (yet) clear how this happens, but it looks like a memory overwrite, leading to a corrupted address. It does not cause a problem for Postgres, probably because the partitionClause field is not dereferenced by the executor. But Citus examines it so hits a SEGFAULT. For reference 9d9c02c is a recent(ish) PG commit that may be relevant.

colm-mchugh avatar Nov 10 '25 18:11 colm-mchugh