gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

TPC-DS Q116 PANIC under ORCA with force multi stage agg

Open kainwen opened this issue 3 years ago • 0 comments

Bug Report

Greenplum version or build

master branch with top commit (86d7da285c3fdd95d458d6a77b929c7d42220253)

gpadmin@mdw:~/benchmarks/tpcds/sql$ psql tpcds
Timing is on.
psql (12beta2)
Type "help" for help.

tpcds=# \i 116.sql
psql:116.sql:29: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
psql:116.sql:29: fatal: connection to server was lost
gpadmin@mdw:~/benchmarks/tpcds/sql$ psql
psql: error: FATAL:  database "gpadmin" does not exist
gpadmin@mdw:~/benchmarks/tpcds/sql$ psql postgres
Timing is on.
psql (12beta2)
Type "help" for help.

postgres=# \q
gpadmin@mdw:~/benchmarks/tpcds/sql$ git log
fatal: not a git repository (or any of the parent directories): .git
gpadmin@mdw:~/benchmarks/tpcds/sql$ psql
psql: error: FATAL:  database "gpadmin" does not exist
gpadmin@mdw:~/benchmarks/tpcds/sql$ psql tpcds
Timing is on.
psql (12beta2)
Type "help" for help.

tpcds=#
tpcds=# show optimizer;
 optimizer
-----------
 on
(1 row)

Time: 0.439 ms
tpcds=# show optimizer_force_multistage_agg ;
 optimizer_force_multistage_agg
--------------------------------
 on
(1 row)

Time: 0.396 ms
tpcds=# \i 116.sql
psql:116.sql:29: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
psql:116.sql:29: fatal: connection to server was lost

gpadmin@mdw:~/benchmarks/tpcds/sql$ psql tpcds
Timing is on.
psql (12beta2)
Type "help" for help.

tpcds=# show optimizer_force_multistage_agg ;
 optimizer_force_multistage_agg
--------------------------------
 on
(1 row)

Time: 0.745 ms
tpcds=# show optimizer;
 optimizer
-----------
 on
(1 row)

Time: 0.339 ms
tpcds=# set optimizer_force_multistage_agg = 0;
SET
Time: 40.681 ms
tpcds=# \i 116.sql
 order count | total shipping cost | total net profit
-------------+---------------------+------------------
       11143 |         53182148.24 |     -10560287.35
(1 row)

Time: 51721.498 ms (00:51.721)
tpcds=# select version();
                                                                                                               version
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12beta2 (Greenplum Database 7.0.0-alpha.0+dev.15670.g86d7da285c build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit compiled on Aug 24 2022 11:14:29 (with assert checking)
(1 row)

Time: 33.931 ms
tpcds=#

The callstack of the coredump:

Core was generated by `postgres:  5432, gpadmin tpcds [local] con8 cmd11 SELECT                      '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fb2f09ff1c0 (LWP 1939687))]
(gdb) bt
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000560f99bad027 in StandardHandlerForSigillSigsegvSigbus_OnMainThread (processName=0x560f9a2bf453 "Master process", postgres_signal_arg=11) at elog.c:5126
#2  0x0000560f999d7ff5 in CdbProgramErrorHandler (postgres_signal_arg=11) at postgres.c:3643
#3  <signal handler called>
#4  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:312
#5  0x0000560f99abf35f in accum_sum_copy (dst=0x560fa81ae938, src=0x560fa85f4861) at numeric.c:9878
#6  0x0000560f99ab3907 in numeric_avg_combine (fcinfo=0x560fa822c9d0) at numeric.c:4197
#7  0x0000560f996f148b in ExecInterpExpr (state=0x560fa822d218, econtext=0x560fa1517ab8, isnull=0x7ffcb726f8f7) at execExprInterp.c:1735
#8  0x0000560f996f17f9 in ExecInterpExprStillValid (state=0x560fa822d218, econtext=0x560fa1517ab8, isNull=0x7ffcb726f8f7) at execExprInterp.c:1868
#9  0x0000560f9971bff8 in ExecEvalExprSwitchContext (state=0x560fa822d218, econtext=0x560fa1517ab8, isNull=0x7ffcb726f8f7) at ../../../src/include/executor/executor.h:353
#10 0x0000560f9971cb9b in advance_aggregates (aggstate=0x560fa1517680) at nodeAgg.c:847
#11 0x0000560f9971fcad in agg_retrieve_direct (aggstate=0x560fa1517680) at nodeAgg.c:2506
#12 0x0000560f9971f710 in ExecAgg (pstate=0x560fa1517680) at nodeAgg.c:2231
#13 0x0000560f99708b4d in ExecProcNodeGPDB (node=0x560fa1517680) at execProcnode.c:623
#14 0x0000560f99708a22 in ExecProcNodeFirst (node=0x560fa1517680) at execProcnode.c:584
#15 0x0000560f9974e2e7 in ExecProcNode (node=0x560fa1517680) at ../../../src/include/executor/executor.h:268
#16 0x0000560f9974e5a3 in ExecSort (pstate=0x560fa15173a8) at nodeSort.c:150
#17 0x0000560f99708b4d in ExecProcNodeGPDB (node=0x560fa15173a8) at execProcnode.c:623
#18 0x0000560f99708a22 in ExecProcNodeFirst (node=0x560fa15173a8) at execProcnode.c:584
#19 0x0000560f9973ba90 in ExecProcNode (node=0x560fa15173a8) at ../../../src/include/executor/executor.h:268
#20 0x0000560f9973bc7b in ExecLimit_guts (pstate=0x560fa1517098) at nodeLimit.c:98
#21 0x0000560f9973c074 in ExecLimit (node=0x560fa1517098) at nodeLimit.c:246
#22 0x0000560f99708b4d in ExecProcNodeGPDB (node=0x560fa1517098) at execProcnode.c:623
#23 0x0000560f99708a22 in ExecProcNodeFirst (node=0x560fa1517098) at execProcnode.c:584
#24 0x0000560f996fa91a in ExecProcNode (node=0x560fa1517098) at ../../../src/include/executor/executor.h:268
#25 0x0000560f996ff387 in ExecutePlan (estate=0x560fa152ca70, planstate=0x560fa1517098, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x560fa7f849f0, execute_once=true)
    at execMain.c:2700
#26 0x0000560f996fc4eb in standard_ExecutorRun (queryDesc=0x560fa140a0e0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:1005
#27 0x0000560f996fbfaa in ExecutorRun (queryDesc=0x560fa140a0e0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:828
#28 0x0000560f999dd9b1 in PortalRunSelect (portal=0x560fa14b10b0, forward=true, count=0, dest=0x560fa7f849f0) at pquery.c:1109
#29 0x0000560f999dd595 in PortalRun (portal=0x560fa14b10b0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x560fa7f849f0, altdest=0x560fa7f849f0, completionTag=0x7ffcb7270030 "") at pquery.c:947
#30 0x0000560f999d51f8 in exec_simple_query (
    query_string=0x560fa13e30d0 "select  \n   count(distinct cs_order_number) as \"order count\"\n  ,sum(cs_ext_ship_cost) as \"total shipping cost\"\n  ,sum(cs_net_profit) as \"total net profit\"\nfrom\n   catalog_sales cs1\n  ,date_dim\n  ,cust"...) at postgres.c:1922
#31 0x0000560f999da686 in PostgresMain (argc=1, argv=0x560fa1412918, dbname=0x560fa1412758 "tpcds", username=0x560fa1412738 "gpadmin") at postgres.c:5353
#32 0x0000560f9990b968 in BackendRun (port=0x560fa1404db0) at postmaster.c:4946
#33 0x0000560f9990affe in BackendStartup (port=0x560fa1404db0) at postmaster.c:4631
#34 0x0000560f99906805 in ServerLoop () at postmaster.c:1964
#35 0x0000560f99905dd5 in PostmasterMain (argc=5, argv=0x560fa13de130) at postmaster.c:1590
#36 0x0000560f997af872 in main (argc=5, argv=0x560fa13de130) at main.c:240
(gdb)

Q116's SQL:

select
   count(distinct cs_order_number) as "order count"
  ,sum(cs_ext_ship_cost) as "total shipping cost"
  ,sum(cs_net_profit) as "total net profit"
from
   catalog_sales cs1
  ,date_dim
  ,customer_address
  ,call_center
where
    d_date between '2002-4-01' and
           (cast('2002-4-01' as date) + 60 )
and cs1.cs_ship_date_sk = d_date_sk
and cs1.cs_ship_addr_sk = ca_address_sk
and ca_state = 'TN'
and cs1.cs_call_center_sk = cc_call_center_sk
and cc_county in ('Walker County','Oglethorpe County','Huron County','Wadena County',
                  'Ziebach County'
)
and exists (select *
            from catalog_sales cs2
            where cs1.cs_order_number = cs2.cs_order_number
              and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
and not exists(select *
               from catalog_returns cr1
               where cs1.cs_order_number = cr1.cr_order_number)
order by count(distinct cs_order_number)
limit 100;

kainwen avatar Aug 24 '22 14:08 kainwen