citus Crash in auto

Could be related #2009

Sep 02 '19 13:09 metdos

I tried running multi_check when auto_explain is loaded. It seems to crash for tests with recursive planning with a backtrace like:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fad43cdc859 in __GI_abort () at abort.c:79
#2  0x00005653e3f4363a in ExceptionalCondition (
    conditionName=conditionName@entry=0x5653e415c780 "!(ActiveSnapshot != ((void *)0))", 
    errorType=errorType@entry=0x5653e3f9901d "FailedAssertion", fileName=fileName@entry=0x5653e415c547 "snapmgr.c", 
    lineNumber=lineNumber@entry=843) at assert.c:54
#3  0x00005653e3f85715 in GetActiveSnapshot () at snapmgr.c:843
#4  0x00005653e3f861ca in GetActiveSnapshot () at snapmgr.c:845
#5  0x00005653e3c2eaa8 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x7fad3590b9a0, into=into@entry=0x0, 
    es=es@entry=0x5653e5b59198, queryString=queryString@entry=0x0, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
    planduration=0x7ffcd4ec9ef0) at explain.c:497
#6  0x00007fad40db6d3d in ExplainSubPlans (distributedPlan=0x7fad359139b0, distributedPlan=0x7fad359139b0, es=0x5653e5b59198)
    at planner/multi_explain.c:217
#7  CitusExplainScan (node=<optimized out>, ancestors=<optimized out>, es=0x5653e5b59198) at planner/multi_explain.c:122
#8  0x00005653e3c2c977 in ExplainNode (planstate=<optimized out>, ancestors=ancestors@entry=0x0, 
    relationship=relationship@entry=0x0, plan_name=plan_name@entry=0x0, es=es@entry=0x5653e5b59198) at explain.c:1786
#9  0x00005653e3c2e736 in ExplainPrintPlan (es=es@entry=0x5653e5b59198, queryDesc=queryDesc@entry=0x5653e5c1fa68)
    at explain.c:705
#10 0x00007fad4488256f in explain_ExecutorEnd (queryDesc=0x5653e5c1fa68) at auto_explain.c:388
#11 0x00005653e3c49ece in PortalCleanup (portal=<optimized out>) at portalcmds.c:301
#12 0x00005653e3f74c85 in PortalDrop (portal=0x5653e5b2e408, isTopCommit=<optimized out>) at portalmem.c:499
#13 0x00005653e3e1a49e in exec_simple_query (
    query_string=0x5653e5a57ac8 "with x as (select a, random() from t) select random(), x.* from x;") at postgres.c:1225
#14 0x00005653e3e1bd23 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5653e5af45d0, dbname=<optimized out>, 
    username=<optimized out>) at postgres.c:4247
#15 0x00005653e3d9269a in BackendRun (port=0x5653e5af20b0, port=0x5653e5af20b0) at postmaster.c:4437
#16 BackendStartup (port=0x5653e5af20b0) at postmaster.c:4128
#17 ServerLoop () at postmaster.c:1704
#18 0x00005653e3d93512 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#19 0x00005653e3aba651 in main (argc=3, argv=0x5653e5a51510) at main.c:228

May 11 '20 17:05 pykello

We'd love to see Citus support auto_explain.

Nov 28 '21 22:11 scottybrisbane

fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue. https://github.com/hslightdb/auto_explain

Feb 08 '22 08:02 hslightdb

fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue. https://github.com/hslightdb/auto_explain

Has this fix been raised or proposed upstream at all? We'd love to see this fixed in the upstream.

Mar 11 '22 03:03 scottybrisbane

Easy way to repro on Citus:


 LOAD 'auto_explain';
CREATE TABLE test(a int);
SELECT create_distributed_table('test', 'a');
INSERT INTO test SELECT i FROM generate_series(0,1000000)i;

 set auto_explain.log_min_duration to 0;        
WITH cte_1 AS (SELECT * FROM test LIMIT 1) SELECT count(*) FROM cte_1;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 48.723 ms
 @:-!>

Sep 26 '22 11:09 onderkalaci

citus
citus copied to clipboard

Crash in auto_explain

citus citus copied to clipboard

Crash in auto_explain

citus
citus copied to clipboard