pg_pathman icon indicating copy to clipboard operation
pg_pathman copied to clipboard

some SQL will fail when use pg_pathman and auto_explain simultaneously

Open ChenHuajun opened this issue 5 years ago • 4 comments

Problem description

pg_pathman and auto_explain are not compatible, which will cause some SQL to fail to execute.

sample:

create table pathman_tb(
 id int,
 statis_date date not null
);
select create_range_partitions(
  'pathman_tb',
 'statis_date',
 '20190101'::date,
 interval '1 d',
 10,
 false
);
insert into pathman_tb(id, statis_date) values (1,'20190101');

set auto_explain.log_min_duration=0;
SELECT count(*) from pathman_tb a  
   where a.statis_date='20190101' and
      (select count(*) from pathman_tb b where a.statis_date=b.statis_date)=1;

will cause error:

postgres=# SELECT count(*) from pathman_tb a  
postgres-#    where a.statis_date='20190101' and
postgres-#       (select count(*) from pathman_tb b where a.statis_date=b.statis_date)=1;
ERROR:  unrecognized node type: 63

callstack

2019-03-11 17:40:21.340 CST 22165 [local] postgres postgres psql ERROR:  XX000: unrecognized node type: 63
2019-03-11 17:40:21.340 CST 22165 [local] postgres postgres psql LOCATION:  ExecInitNode, execProcnode.c:368
2019-03-11 17:40:21.340 CST 22165 [local] postgres postgres psql STATEMENT:  SELECT count(*) from pathman_tb a
           where a.statis_date='20190101' and
              (select count(*) from pathman_tb b where a.statis_date=b.statis_date)=1;
(gdb) bt
#0  ExecInitNode (node=0x2750d58, estate=0x2729fa8, eflags=0) at execProcnode.c:368
#1  0x00007ff3b56d1d2b in explain_append_common () from /usr/pgsql-10/lib/pg_pathman.so
#2  0x0000000000596c52 in ExplainNode (planstate=<value optimized out>, ancestors=0x26872d0, relationship=0x94ea30 "Outer", plan_name=<value optimized out>, es=0x2685908) at explain.c:1551
#3  0x0000000000595ed8 in ExplainNode (planstate=<value optimized out>, ancestors=0x26872d0, relationship=0x94ea54 "SubPlan", plan_name=<value optimized out>, es=0x2685908) at explain.c:1729
#4  0x00000000005957e8 in ExplainSubPlans (plans=<value optimized out>, ancestors=0x26872d0, relationship=0x94ea54 "SubPlan", es=0x2685908) at explain.c:2973
#5  0x0000000000595f41 in ExplainNode (planstate=<value optimized out>, ancestors=<value optimized out>, relationship=0x94ea5c "Member", plan_name=<value optimized out>, es=0x2685908)
    at explain.c:1779
#6  0x0000000000598155 in ExplainMemberNodes (plans=<value optimized out>, planstates=<value optimized out>, ancestors=0x26872d0, es=0x2685908) at explain.c:2937
#7  0x0000000000595f27 in ExplainNode (planstate=<value optimized out>, ancestors=<value optimized out>, relationship=0x94ea30 "Outer", plan_name=<value optimized out>, es=0x2685908)
    at explain.c:1761
#8  0x0000000000595ed8 in ExplainNode (planstate=<value optimized out>, ancestors=0x26872d0, relationship=0x0, plan_name=<value optimized out>, es=0x2685908) at explain.c:1729
#9  0x00000000005980f8 in ExplainPrintPlan (es=0x2685908, queryDesc=0x2739938) at explain.c:641
#10 0x00007ff3b52a64c2 in explain_ExecutorEnd (queryDesc=0x2739938) at auto_explain.c:334
#11 0x00000000005af67e in PortalCleanup (portal=0x273aed8) at portalcmds.c:302
#12 0x000000000084d19a in PortalDrop (portal=0x273aed8, isTopCommit=0 '\000') at portalmem.c:489
#13 0x0000000000723f62 in exec_simple_query (
    query_string=0x268a788 "SELECT count(*) from pathman_tb a  \n   where a.statis_date='20190101' and\n      (select count(*) from pathman_tb b where a.statis_date=b.statis_date)=1;")
    at postgres.c:1109
#14 0x0000000000724f09 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x2630f48 "postgres", username=<value optimized out>) at postgres.c:4088
#15 0x00000000006b8c8a in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4405
#16 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4077
#17 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1755
#18 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1363
#19 0x00000000006395e0 in main (argc=4, argv=0x25f99e0) at main.c:228
(gdb) p *node
$1 = {type = T_SeqScanState, startup_cost = 2.0343548220029999e-316, total_cost = 2.0289501390900134e-316, plan_rows = 3.154700056775194e-317, plan_width = 6385184, 
  parallel_aware = 0 '\000', parallel_safe = 0 '\000', plan_node_id = 0, targetlist = 0x0, qual = 0x2753718, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}

Environment

  • pg_pathman 1.5(master code) and 1.4.9 and 1.4.13
  • PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit

ChenHuajun avatar Mar 11 '19 13:03 ChenHuajun

hi ChenHuajun 。 我正在使用citus8.2-1,有一个问题想请教一下。 有下面一个场景,创建索引的时候报错了。deadlock。不知为何?

有一个分区表,创建好表,做分布,创建分区,再创建索引,报错:

error:canceling the transaction since it was involved in a distributed deadlock

qzren1982 avatar May 14 '19 03:05 qzren1982

并不是所有的索引都会报错。

qzren1982 avatar May 14 '19 03:05 qzren1982

“分区表,创建好表,做分布,创建分区,再创建索引”确认一下这些操作有没有放在一个事务里。And, You could report problem of Citus to https://github.com/citusdata/citus。

ChenHuajun avatar May 19 '19 04:05 ChenHuajun

ERROR: unrecognized node type: 63 是啥原因呢?

372046933 avatar Jun 20 '19 09:06 372046933