Motion nodes and loci should not be assigned in utility mode
Greenplum version or build
master as of 992cc6f8e83, did not test on 6X.
Expected behavior
In a psql session connected in utility mode to a segment postmaster, queries such as select * from pg_stat_user_tables; should not cause assertion failure or crash when being planned.
Actual behavior
PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002
postgres=# select * from pg_stat_user_tables where relname='test';
FATAL: Unexpected internal error (nodeMotion.c:622)
DETAIL: FailedAssertion("!(node->motionID > 0)", File: "nodeMotion.c", Line: 622)
Step to reproduce the behavior
PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002
postgres=# select * from pg_stat_user_tables where relname='test';
Stack trace from the core:
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* f #0:libsystem_kernel.dylib __pthread_kill
f #1:libsystem_pthread.dylib pthread_kill
f #2:libsystem_c.dylib abort
f #3:postmaster ExceptionalCondition(conditionName="!(node->motionID > 0)", errorType="FailedAssertion", fileName="nodeMotion.c", lineNumber=622) at assert.c:66
f #4:postmaster ExecInitMotion(node=0x00007fc8a01d7b48, estate=0x00007fc89e01fa50, eflags=16) at nodeMotion.c:622
f #5:postmaster ExecInitNode(node=0x00007fc8a01d7b48, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:438
f #6:postmaster ExecInitAppend(node=0x00007fc8a01dc5b8, estate=0x00007fc89e01fa50, eflags=16) at nodeAppend.c:170
f #7:postmaster ExecInitNode(node=0x00007fc8a01dc5b8, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:227
f #8:postmaster ExecInitHashJoin(node=0x00007fc8a01e0430, estate=0x00007fc89e01fa50, eflags=16) at nodeHashjoin.c:636
f #9:postmaster ExecInitNode(node=0x00007fc8a01e0430, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:368
f #10:postmaster InitPlan(queryDesc=0x00007fc89e00b650, eflags=16) at execMain.c:1984
f #11:postmaster standard_ExecutorStart(queryDesc=0x00007fc89e00b650, eflags=16) at execMain.c:486
f #12:postmaster ExecutorStart(queryDesc=0x00007fc89e00b650, eflags=0) at execMain.c:210
f #13:postmaster PortalStart(portal=0x00007fc89e01d850, params=0x0000000000000000, eflags=0, snapshot=0x0000000000000000, ddesc=0x0000000000000000) at pquery.c:709
f #14:postmaster exec_simple_query(query_string="select * from pg_stat_user_tables where relname='test';") at postgres.c:1739
f #15:postmaster PostgresMain(argc=1, argv=0x00007fc89f00ac68, dbname="postgres", username="apraveen") at postgres.c:5082
f #16:postmaster BackendRun(port=0x00007fc89d404080) at postmaster.c:4793
f #17:postmaster BackendStartup(port=0x00007fc89d404080) at postmaster.c:4461
f #18:postmaster ServerLoop at postmaster.c:1951
f #19:postmaster PostmasterMain(argc=5, argv=0x00007fc89b40abe0) at postmaster.c:1521
f #20:postmaster main(argc=5, argv=0x00007fc89b40abe0) at main.c:240
This quick change seems to fix the issue but it could introduce new regressions. I don't want to check that now, let me note the change here instead.
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1599,7 +1599,7 @@ set_append_path_locus(PlannerInfo *root, Path *pathnode, RelOptInfo *rel,
*subpaths_out = NIL;
/* If no subpath, any worker can execute this Append. Result has 0 rows. */
- if (!subpaths)
+ if (!subpaths || Gp_role != GP_ROLE_DISPATCH)
{
CdbPathLocus_MakeGeneral(&pathnode->locus);
return;
@kainwen can you check ?
I simplify the reproduce step as follows:
create view test as select C.oid AS relid, C.relname AS relname FROM pg_class C ;
PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002 SELECT allt.relid, allt.relname FROM gp_dist_random('test') allt inner join pg_class c on allt.relid = c.oid;
FATAL: Unexpected internal error (assert.c:44) DETAIL: FailedAssertion("!(node->motionID > 0)", File: "nodeMotion.c", Line: 683)
RCA: test is a view and it is the param of gp_dist_random, we call ApplyRetrieveRule to rewrite RangeTblEntry of test, change rte->rtekind to RTE_SUBQUERY, and forceDistRand of rte is true.
in set_subquery_pathlist set the locus of subquery is strewn
foreach(lc, sub_final_rel->pathlist) { Path *subpath = (Path *) lfirst(lc); Path *path; List *l; List *pathkeys; CdbPathLocus locus;
if (forceDistRand)
CdbPathLocus_MakeStrewn(&locus, getgpsegmentCount());
else
locus = cdbpathlocus_from_subquery(root, rel, subpath);
}
so we add a motion for pg_class in cdbpath_motion_for_join. however, in utility mode we should not has motion node and hit assert panic.
@zxuejing @huansong will this https://github.com/greenplum-db/gpdb/commit/578bd4236880f58272b3a291f5cb659208e3397a fix the issue?
Yes tested that https://github.com/greenplum-db/gpdb/commit/578bd4236880f58272b3a291f5cb659208e3397a has fixed the error cases here. Though the error behavior in #15238 and this one are different but I think the root cause should be the same. @zxuejing could you please check if we can close this issue?
Hi @huansong, @kainwen the RCA of #15238 and this issue is the same. I will close this issue.