citus
citus copied to clipboard
Crash in auto_explain
Could be related #2009
I tried running multi_check when auto_explain is loaded. It seems to crash for tests with recursive planning with a backtrace like:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fad43cdc859 in __GI_abort () at abort.c:79
#2 0x00005653e3f4363a in ExceptionalCondition (
conditionName=conditionName@entry=0x5653e415c780 "!(ActiveSnapshot != ((void *)0))",
errorType=errorType@entry=0x5653e3f9901d "FailedAssertion", fileName=fileName@entry=0x5653e415c547 "snapmgr.c",
lineNumber=lineNumber@entry=843) at assert.c:54
#3 0x00005653e3f85715 in GetActiveSnapshot () at snapmgr.c:843
#4 0x00005653e3f861ca in GetActiveSnapshot () at snapmgr.c:845
#5 0x00005653e3c2eaa8 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x7fad3590b9a0, into=into@entry=0x0,
es=es@entry=0x5653e5b59198, queryString=queryString@entry=0x0, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
planduration=0x7ffcd4ec9ef0) at explain.c:497
#6 0x00007fad40db6d3d in ExplainSubPlans (distributedPlan=0x7fad359139b0, distributedPlan=0x7fad359139b0, es=0x5653e5b59198)
at planner/multi_explain.c:217
#7 CitusExplainScan (node=<optimized out>, ancestors=<optimized out>, es=0x5653e5b59198) at planner/multi_explain.c:122
#8 0x00005653e3c2c977 in ExplainNode (planstate=<optimized out>, ancestors=ancestors@entry=0x0,
relationship=relationship@entry=0x0, plan_name=plan_name@entry=0x0, es=es@entry=0x5653e5b59198) at explain.c:1786
#9 0x00005653e3c2e736 in ExplainPrintPlan (es=es@entry=0x5653e5b59198, queryDesc=queryDesc@entry=0x5653e5c1fa68)
at explain.c:705
#10 0x00007fad4488256f in explain_ExecutorEnd (queryDesc=0x5653e5c1fa68) at auto_explain.c:388
#11 0x00005653e3c49ece in PortalCleanup (portal=<optimized out>) at portalcmds.c:301
#12 0x00005653e3f74c85 in PortalDrop (portal=0x5653e5b2e408, isTopCommit=<optimized out>) at portalmem.c:499
#13 0x00005653e3e1a49e in exec_simple_query (
query_string=0x5653e5a57ac8 "with x as (select a, random() from t) select random(), x.* from x;") at postgres.c:1225
#14 0x00005653e3e1bd23 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5653e5af45d0, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4247
#15 0x00005653e3d9269a in BackendRun (port=0x5653e5af20b0, port=0x5653e5af20b0) at postmaster.c:4437
#16 BackendStartup (port=0x5653e5af20b0) at postmaster.c:4128
#17 ServerLoop () at postmaster.c:1704
#18 0x00005653e3d93512 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#19 0x00005653e3aba651 in main (argc=3, argv=0x5653e5a51510) at main.c:228
We'd love to see Citus support auto_explain
.
fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue. https://github.com/hslightdb/auto_explain
fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue. https://github.com/hslightdb/auto_explain
Has this fix been raised or proposed upstream at all? We'd love to see this fixed in the upstream.
Easy way to repro on Citus:
LOAD 'auto_explain';
CREATE TABLE test(a int);
SELECT create_distributed_table('test', 'a');
INSERT INTO test SELECT i FROM generate_series(0,1000000)i;
set auto_explain.log_min_duration to 0;
WITH cte_1 AS (SELECT * FROM test LIMIT 1) SELECT count(*) FROM cte_1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 48.723 ms
@:-!>