citus copied to clipboard
Crash in auto_explain
Could be related #2009
I tried running multi_check when auto_explain is loaded. It seems to crash for tests with recursive planning with a backtrace like:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fad43cdc859 in __GI_abort () at abort.c:79
#2 0x00005653e3f4363a in ExceptionalCondition (
conditionName=conditionName@entry=0x5653e415c780 "!(ActiveSnapshot != ((void *)0))",
errorType=errorType@entry=0x5653e3f9901d "FailedAssertion", fileName=fileName@entry=0x5653e415c547 "snapmgr.c",
lineNumber=lineNumber@entry=843) at assert.c:54
#3 0x00005653e3f85715 in GetActiveSnapshot () at snapmgr.c:843
#4 0x00005653e3f861ca in GetActiveSnapshot () at snapmgr.c:845
#5 0x00005653e3c2eaa8 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x7fad3590b9a0, into=into@entry=0x0,
es=es@entry=0x5653e5b59198, queryString=queryString@entry=0x0, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
planduration=0x7ffcd4ec9ef0) at explain.c:497
#6 0x00007fad40db6d3d in ExplainSubPlans (distributedPlan=0x7fad359139b0, distributedPlan=0x7fad359139b0, es=0x5653e5b59198)
at planner/multi_explain.c:217
#7 CitusExplainScan (node=<optimized out>, ancestors=<optimized out>, es=0x5653e5b59198) at planner/multi_explain.c:122
#8 0x00005653e3c2c977 in ExplainNode (planstate=<optimized out>, ancestors=ancestors@entry=0x0,
relationship=relationship@entry=0x0, plan_name=plan_name@entry=0x0, es=es@entry=0x5653e5b59198) at explain.c:1786
#9 0x00005653e3c2e736 in ExplainPrintPlan (es=es@entry=0x5653e5b59198, queryDesc=queryDesc@entry=0x5653e5c1fa68)
at explain.c:705
#10 0x00007fad4488256f in explain_ExecutorEnd (queryDesc=0x5653e5c1fa68) at auto_explain.c:388
#11 0x00005653e3c49ece in PortalCleanup (portal=<optimized out>) at portalcmds.c:301
#12 0x00005653e3f74c85 in PortalDrop (portal=0x5653e5b2e408, isTopCommit=<optimized out>) at portalmem.c:499
#13 0x00005653e3e1a49e in exec_simple_query (
query_string=0x5653e5a57ac8 "with x as (select a, random() from t) select random(), x.* from x;") at postgres.c:1225
#14 0x00005653e3e1bd23 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5653e5af45d0, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4247
#15 0x00005653e3d9269a in BackendRun (port=0x5653e5af20b0, port=0x5653e5af20b0) at postmaster.c:4437
#16 BackendStartup (port=0x5653e5af20b0) at postmaster.c:4128
#17 ServerLoop () at postmaster.c:1704
#18 0x00005653e3d93512 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#19 0x00005653e3aba651 in main (argc=3, argv=0x5653e5a51510) at main.c:228
We'd love to see Citus support auto_explain
fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue.
fixed. it is auto_explain's bug, but will occured under customscan node, citus is only a special case. after portalrun, postgresql will pop all snapshot it pushed. so auto_explain need to manage snapshot itself. we released an auto_explain fork by fixing the issue.
Has this fix been raised or proposed upstream at all? We'd love to see this fixed in the upstream.
Easy way to repro on Citus:
LOAD 'auto_explain';
CREATE TABLE test(a int);
SELECT create_distributed_table('test', 'a');
INSERT INTO test SELECT i FROM generate_series(0,1000000)i;
set auto_explain.log_min_duration to 0;
WITH cte_1 AS (SELECT * FROM test LIMIT 1) SELECT count(*) FROM cte_1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 48.723 ms