gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Motion nodes and loci should not be assigned in utility mode

Open asimrp opened this issue 4 years ago • 1 comments

Greenplum version or build

master as of 992cc6f8e83, did not test on 6X.

Expected behavior

In a psql session connected in utility mode to a segment postmaster, queries such as select * from pg_stat_user_tables; should not cause assertion failure or crash when being planned.

Actual behavior

PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002
postgres=# select * from pg_stat_user_tables where relname='test';
FATAL:  Unexpected internal error (nodeMotion.c:622)
DETAIL:  FailedAssertion("!(node->motionID > 0)", File: "nodeMotion.c", Line: 622)

Step to reproduce the behavior

PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002
postgres=# select * from pg_stat_user_tables where relname='test';

Stack trace from the core:

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * f #0:libsystem_kernel.dylib __pthread_kill
    f #1:libsystem_pthread.dylib pthread_kill
    f #2:libsystem_c.dylib abort
    f #3:postmaster ExceptionalCondition(conditionName="!(node->motionID > 0)", errorType="FailedAssertion", fileName="nodeMotion.c", lineNumber=622) at assert.c:66
    f #4:postmaster ExecInitMotion(node=0x00007fc8a01d7b48, estate=0x00007fc89e01fa50, eflags=16) at nodeMotion.c:622
    f #5:postmaster ExecInitNode(node=0x00007fc8a01d7b48, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:438
    f #6:postmaster ExecInitAppend(node=0x00007fc8a01dc5b8, estate=0x00007fc89e01fa50, eflags=16) at nodeAppend.c:170
    f #7:postmaster ExecInitNode(node=0x00007fc8a01dc5b8, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:227
    f #8:postmaster ExecInitHashJoin(node=0x00007fc8a01e0430, estate=0x00007fc89e01fa50, eflags=16) at nodeHashjoin.c:636
    f #9:postmaster ExecInitNode(node=0x00007fc8a01e0430, estate=0x00007fc89e01fa50, eflags=16) at execProcnode.c:368
    f #10:postmaster InitPlan(queryDesc=0x00007fc89e00b650, eflags=16) at execMain.c:1984
    f #11:postmaster standard_ExecutorStart(queryDesc=0x00007fc89e00b650, eflags=16) at execMain.c:486
    f #12:postmaster ExecutorStart(queryDesc=0x00007fc89e00b650, eflags=0) at execMain.c:210
    f #13:postmaster PortalStart(portal=0x00007fc89e01d850, params=0x0000000000000000, eflags=0, snapshot=0x0000000000000000, ddesc=0x0000000000000000) at pquery.c:709
    f #14:postmaster exec_simple_query(query_string="select * from pg_stat_user_tables where relname='test';") at postgres.c:1739
    f #15:postmaster PostgresMain(argc=1, argv=0x00007fc89f00ac68, dbname="postgres", username="apraveen") at postgres.c:5082
    f #16:postmaster BackendRun(port=0x00007fc89d404080) at postmaster.c:4793
    f #17:postmaster BackendStartup(port=0x00007fc89d404080) at postmaster.c:4461
    f #18:postmaster ServerLoop at postmaster.c:1951
    f #19:postmaster PostmasterMain(argc=5, argv=0x00007fc89b40abe0) at postmaster.c:1521
    f #20:postmaster main(argc=5, argv=0x00007fc89b40abe0) at main.c:240

asimrp avatar Sep 17 '20 05:09 asimrp

This quick change seems to fix the issue but it could introduce new regressions. I don't want to check that now, let me note the change here instead.

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1599,7 +1599,7 @@ set_append_path_locus(PlannerInfo *root, Path *pathnode, RelOptInfo *rel,
        *subpaths_out = NIL;
 
        /* If no subpath, any worker can execute this Append.  Result has 0 rows. */
-       if (!subpaths)
+       if (!subpaths || Gp_role != GP_ROLE_DISPATCH)
        {
                CdbPathLocus_MakeGeneral(&pathnode->locus);
                return;

asimrp avatar Oct 14 '20 10:10 asimrp

@kainwen can you check ?

dpandhi-git avatar Dec 23 '22 01:12 dpandhi-git

I simplify the reproduce step as follows:

create view test as select C.oid AS relid, C.relname AS relname FROM pg_class C ;

PGOPTIONS='-c gp_role=utility' psql -d postgres -p 7002 SELECT allt.relid, allt.relname FROM gp_dist_random('test') allt inner join pg_class c on allt.relid = c.oid;

FATAL: Unexpected internal error (assert.c:44) DETAIL: FailedAssertion("!(node->motionID > 0)", File: "nodeMotion.c", Line: 683)

RCA: test is a view and it is the param of gp_dist_random, we call ApplyRetrieveRule to rewrite RangeTblEntry of test, change rte->rtekind to RTE_SUBQUERY, and forceDistRand of rte is true.

in set_subquery_pathlist set the locus of subquery is strewn

foreach(lc, sub_final_rel->pathlist) { Path *subpath = (Path *) lfirst(lc); Path *path; List *l; List *pathkeys; CdbPathLocus locus;

if (forceDistRand)
	CdbPathLocus_MakeStrewn(&locus, getgpsegmentCount());
else
	locus = cdbpathlocus_from_subquery(root, rel, subpath);

}

so we add a motion for pg_class in cdbpath_motion_for_join. however, in utility mode we should not has motion node and hit assert panic.

zxuejing avatar Mar 30 '23 08:03 zxuejing

@zxuejing @huansong will this https://github.com/greenplum-db/gpdb/commit/578bd4236880f58272b3a291f5cb659208e3397a fix the issue?

kainwen avatar Mar 30 '23 12:03 kainwen

Yes tested that https://github.com/greenplum-db/gpdb/commit/578bd4236880f58272b3a291f5cb659208e3397a has fixed the error cases here. Though the error behavior in #15238 and this one are different but I think the root cause should be the same. @zxuejing could you please check if we can close this issue?

huansong avatar Mar 30 '23 15:03 huansong

Hi @huansong, @kainwen the RCA of #15238 and this issue is the same. I will close this issue.

zxuejing avatar Mar 31 '23 02:03 zxuejing