gpdb Segment Unexpected internal error: Segment process received signal SIGSEGV

Bug Report

Greenplum version or build

Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source

OS version and uname -a

Linux mdw 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 23 16:47:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

autoconf options used ( config.status --config )

Installation information ( pg_config )

postgresql.conf checkpoint_segments=8 gp_contentid=-1 gp_vmem_protect_limit=75018 work_mem=2048MB effective_cache_size=128GB maintenance_work_mem=4GB log_min_duration_statement=10s log_statement=none log_filename='gpdb-%a.log' log_truncate_on_rotation=on log_rotation_age=1d statement_mem=1024MB max_statement_mem=2048MB statement_timeout=1440s gp_enable_global_deadlock_detector=on gp_resqueue_priority_cpucores_per_segment=96 max_prepared_transactions=1000 max_connections=1000 gp_enable_query_metrics=on shared_preload_libraries='metrics_collector'

free -m total used free shared buff/cache available Mem: 772426 19227 679453 30139 73744 721221 Swap: 0 0 0

Expected behavior

Actual behavior

Step to reproduce the behavior

1：My table definition is

CREATE TABLE public.table1( id bigint NOT NULL, first_access_time bigint, access_time bigint, ..... .... ... ) DISTRIBUTED BY (id) PARTITION BY RANGE(first_access_time) ( PARTITION pn_46 START (1646683200000::bigint) END (1646769600000::bigint) EVERY (86400000::bigint) WITH (tablename='alarm_collection_1_prt_pn_46', appendonly='false'), PARTITION pn_47 START (1646769600000::bigint) END (1646856000000::bigint) EVERY (86400000::bigint) WITH (tablename='alarm_collection_1_prt_pn_47', appendonly='false'), .... ... ..(216 partition) DEFAULT PARTITION other WITH (tablename='alarm_collection_1_prt_other', appendonly='false') );

2：select count(*) from table1; count 61864 (1 row)

3：my sql is -- select count(*) from table1WHERE ((table1.group_ids && '{8}') OR (table1.group_ids && '{35}') OR (table1.group_ids && '{61}') OR (table1.group_ids && '{91}') OR (table1.group_ids && '{162}') OR (table1.group_ids && '{196}') OR (table1.group_ids && '{199}') OR ,......... ,......... ,......... (Tens of thousands category)

After the execution, segment becomes panic

4：the pg_log is

2022-09-08 13:04:39.208390 +04,,,p314615,th332970112,"10.3.206.110","26080",2022-09-08 13:04:39 +04,0,,,seg7,,,,,"LOG","00000","PID 313773 in cancel request did not match any process",,,,,,,0,,"postmaster.c",2724, 2022-09-08 13:04:39.555222 +04,"gpadmin",,p314695,th332970112,"10.3.206.110","26528",2022-09-08 13:04:39 +04,0,,,seg7,,,,,"LOG","00000","requesting fts retry as mirror didn't connect yet but in grace period: 3","pid zero at time: 0 accept connections start time: 1662627876",,,,,,0,,"gp_replication.c",535, 2022-09-08 13:04:40.068181 +04,,,p314639,th0,,,2022-09-08 13:04:39 +04,0,con272093585,cmd17,seg7,slice1,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 0x7f66115b9630 libpthread.so.0 + 0x115b9630 2 0x8bc6c8 postgres ExecProcNode (execProcnode.c:1017) 3 0x8bc718 postgres ExecProcNode (execProcnode.c:990) 4 0x8e9e25 postgres ExecResult (tuptable.h:159) 5 0x8bc468 postgres ExecProcNode (execProcnode.c:970) 6 0x8fbf50 postgres ExecMotion (tuptable.h:159) 7 0x8bc548 postgres ExecProcNode (execProcnode.c:1121) 8 0x8b39f9 postgres (tuptable.h:159) 9 0x8b4744 postgres standard_ExecutorRun (execMain.c:2943) 10 0xa8ac57 postgres (pquery.c:1152) 11 0xa8cc41 postgres PortalRun (pquery.c:999) 12 0xa85029 postgres (postgres.c:1389) 13 0xa89c8a postgres PostgresMain (postgres.c:5412) " 2022-09-08 13:04:40.068265 +04,,,p54713,th332970112,,,,0,,,seg7,,,,,"LOG","00000","server process (PID 314639) was terminated by signal 11: Segmentation fault","Failed process was running: SELECT count('*') AS cnt FROM alarm_collection WHERE ((table1.group_ids && '{8}') OR (table1.group_ids && '{35}') OR (table1.group_ids && '{61}') OR (table1.group_ids && '{91}') OR (table1.group_ids && '{162}') OR (table1.group_ids && '{196}') OR (table1.group_ids && '{199}') OR (table1.group_ids && '{266}') OR (table1.group_ids && '{272}') OR (table1.group_ids && '{302}') &&",,,,,,0,,"postmaster.c",3987,

Sep 08 '22 10:09 dblife1024

Hi would you please offer the

coredump
minirepro (doc how to use it https://community.pivotal.io/s/article/How-to-Collect-DDL-and-Statistics-Information-Using-the-Minirepro-Utility?language=en_US)

Sep 08 '22 11:09 kainwen

I'm really, really sorry，I can't find the coredump，It seems that coredump has not been generated and I work in the security industry and have a certain confidentiality agreement. I can't send minirepro documents，Can you help me from the following error log？

1、the segment error

From the start time of the segment process,The segment process is not shutdown，but the segment log have some error log

2022-09-09 10:05:54.127932 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","received fast shutdown request",,,,,,,0,,"postmaster.c",3090, 2022-09-09 10:05:54.127966 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","aborting any active transactions",,,,,,,0,,"postmaster.c",3116, 2022-09-09 10:05:54.128163 +04,,,p271807,th-866776960,,,,0,,,seg30,,,,,"FATAL","57P01","terminating background worker ""sweeper process"" due to administrator command",,,,,,,0,,"bgworker.c",565, 2022-09-09 10:05:54.130843 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","worker process: sweeper process (PID 271807) exited with exit code 1",,,,,,,0,,"postmaster.c",3965, 2022-09-09 10:05:54.131206 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","worker process: metrics collector (PID 271972) exited with exit code 1",,,,,,,0,,"postmaster.c",3965, 2022-09-09 10:05:54.734010 +04,,,p271803,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","shutting down",,,,,,,0,,"xlog.c",8433, 2022-09-09 10:05:54.777894 +04,,,p271803,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","database system is shut down",,,,,,,0,,"xlog.c",8468, 2022-09-09 10:06:48.232477 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system was shut down at 2022-09-09 10:05:54 +04",,,,,,,0,,"xlog.c",6421, 2022-09-09 10:06:48.232588 +04,"gpadmin",,p177467,th1834031232,"10.3.206.113","49328",2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 0/0",,,,,,0,,"postmaster.c",2556, 2022-09-09 10:06:48.232601 +04,"gpadmin","postgres",p177469,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 0/0",,,,,,0,,"postmaster.c",2556, 2022-09-09 10:06:48.232611 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","end of transaction log location is 12/4B4BD5F0",,,,,,,0,,"xlog.c",7476, 2022-09-09 10:06:48.232620 +04,"gpadmin","postgres",p177469,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","08006","could not send data to client: Broken pipe",,,,,,,0,,"pqcomm.c",1586, 2022-09-09 10:06:48.233372 +04,"gpadmin","postgres",p177470,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 12/4B4BD570",,,,,,0,,"postmaster.c",2556, 2022-09-09 10:06:48.233831 +04,"gpadmin","postgres",p177468,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 12/4B4BD570",,,,,,0,,"postmaster.c",2556, 2022-09-09 10:06:48.233962 +04,"gpadmin","postgres",p177468,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","08006","could not send data to client: Broken pipe",,,,,,,0,,"pqcomm.c",1586, 2022-09-09 10:06:48.234392 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","latest completed transaction id is 1051406 and next transaction id is 1051407",,,,,,,0,,"xlog.c",7764, 2022-09-09 10:06:48.234763 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","MultiXact member wraparound protections are now enabled",,,,,,,0,,"multixact.c",2619, 2022-09-09 10:06:48.234781 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",7788, 2022-09-09 10:06:48.237422 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","starting background worker process ""sweeper process""",,,,,,,0,,"postmaster.c",6169, 2022-09-09 10:06:48.237669 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","PostgreSQL 9.4.26 (Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 9 2022 00:51:26",,,,,,,0,,"postmaster.c",3290, 2022-09-09 10:06:48.237687 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 9.4.26 (Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 9 2022 00:51:26",,,,,,0,,"postmaster.c",3294, 2022-09-09 10:06:48.439391 +04,"gpadmin",,p177534,th1834031232,"10.3.206.113","49342",2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","00000","standby ""gp_walreceiver"" is now the synchronous standby with priority 1",,,,,,,0,,"syncrep.c",578, 2022-09-09 10:07:01.610097 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","registering background worker ""metrics collector""",,,,,,,0 2022-09-09 10:09:18.454108 +04,,,p177476,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","activeWeight underflow!",,,,,,,0,,"backoff.c",934, 2022-09-09 10:09:23.319351 +04,,,p184638,th0,,,2022-09-09 10:09:22 +04,0,con212,cmd34,seg30,slice1,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 0x7fa36ad3a630 libpthread.so.0 + 0x6ad3a630 2 0x8bc6c8 postgres ExecProcNode (execProcnode.c:1017) 3 0x8bc718 postgres ExecProcNode (execProcnode.c:990) 4 0x8e9e25 postgres ExecResult (tuptable.h:159) 5 0x8bc468 postgres ExecProcNode (execProcnode.c:970) 6 0x8fbf50 postgres ExecMotion (tuptable.h:159) 7 0x8bc548 postgres ExecProcNode (execProcnode.c:1121) 8 0x8b39f9 postgres (tuptable.h:159) 9 0x8b4744 postgres standard_ExecutorRun (execMain.c:2943) 10 0xa8ac57 postgres (pquery.c:1152) 11 0xa8cc41 postgres PortalRun (pquery.c:999) 12 0xa85029 postgres (postgres.c:1389) 13 0xa89c8a postgres PostgresMain (postgres.c:5412) " 2022-09-09 10:09:23.319435 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","server process (PID 184638) was terminated by signal 11: Segmentation fault","Failed process was running: SELECT count('*') AS cnt

### 2、the FTS error log 2022-09-09 10:09:44.823765 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode replayed (12/4B525C50) (content=30) primary dbid=57, mirror dbid=57",,,,,,,0,,"ftsprobe.c",259, 2022-09-09 10:09:44.823772 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:09:49.051857 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: segment (content=30, dbid=49, role=p) reported isMirrorUp 0, isInSync 0, isSyncRepEnabled 1, isRoleMirror 0, and retryRequested 1 to the prober.",,,,,,,0,,"ftsprobe.c",626, 2022-09-09 10:09:49.051900 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS max (5) retries exhausted (content=30, dbid=49) state=6",,,,,,,0,,"ftsprobe.c",770, 2022-09-09 10:09:49.051906 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS skipping mirror down update for (content=30) as retryRequested",,,,,,,0,,"ftsprobe.c",1023, and retryRequested 0 to the prober.",,,,,,,0,,"ftsprobe.c",626, 2022-09-09 10:10:18.023978 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=49) state=12, retry_count=0, conn->status=-1",,,,,,,0,,"ftsprobe.c",288, 2022-09-09 10:10:18.024220 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsSend (content=30, dbid=49) state=12, retry_count=0, conn->asyncStatus=-1",,,,,,,0,,"ftsprobe.c",532, 2022-09-09 10:11:02.961662 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=49) state=0, retry_count=5, conn->status=4",,,,,,,0,,"ftsprobe.c",288, 2022-09-09 10:11:02.961669 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode and not making progress (content=30) primary dbid=49, mirror dbid=57",,,,,,,0,,"ftsprobe.c",246, 2022-09-09 10:11:02.961677 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:11: 2022-09-09 10:11:02.961669 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode and not making progress (content=30) primary dbid=49, mirror dbid=57",,,,,,,0,,"ftsprobe.c",246, 2022-09-09 10:11:02.961677 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:11:02.962122 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS max (5) retries exhausted (content=30, dbid=49) state=9",,,,,,,0,,"ftsprobe.c",770, 2022-09-09 10:11:02.963312 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS promoting mirror (content=30, dbid=57) to be the new primary",,,,,,,0,,"ftsprobe.c",1109, 2022-09-09 10:11:02.963546 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=57) state=2, retry_count=0, conn->status=-1",,,,,,,0,,"ftsprobe.c",288,

Sep 09 '22 07:09 dblife1024

Can you paste the plan that lead to PANIC here?

Sep 09 '22 07:09 kainwen

1、The result of explain is:

Limit (cost=0.00..6.00 rows=1 width=8) -> Aggregate (cost=0.00..6.00 rows=1 width=8) -> Gather Motion 32:1 (slice1; segments: 32) (cost=0.00..6.00 rows=1 width=1) -> Result (cost=0.00..6.00 rows=1 width=1) -> Sequence (cost=0.00..6.00 rows=1 width=1) -> Partition Selector for alarm_collection (dynamic scan id: 1) (cost=10.00..100.00 rows=4 width=4) Partitions selected: 2 (out of 213) -> Dynamic Index Scan on alarm_collection (dynamic scan id: 1) (cost=0.00..6.00 rows=1 width=1) Index Cond: ((first_access_time >= 1662667200000::bigint) AND (first_access_time <= 1662703761698::bigint) AND (access_time >= 1662703200000::bigint) AND (access_time <= 1662703 761698::bigint)) Filter: ((access_time >= 1662703200000::bigint) AND (access_time <= 1662703761698::bigint) AND (first_access_time >= 1662667200000::bigint) AND (first_access_time <= 16627037616 98::bigint) AND ((group_ids && '{11}'::integer[]) OR (group_ids && '{28}'::integer[]) OR (group_ids && '{33}'::integer[]) OR (group_ids && '{107}'::integer[]) OR (group_ids && '{10}'::integer[]) OR (group_ids & & '{183}'::integer[]) OR (group_ids && '{213}'::integer[]) OR (group_ids && '{218}'::integer[]) OR (group_ids && '{224}'::integer[]) OR (group_ids && '{281}'::integer[]) OR (group_ids && '{289}'::integer[]) OR (group_ids && '{407}'::integer[]) OR (group_ids && '{25}'::integer[])*************【300 parameters】OR (group_ids && '{412}'::integer[])) AND (vendor_id = 1) AND ((branch_id)::text = 'QAAwpAEiM'::text))

2、the result of explain analyze is：

ERROR: Error on receive from seg0 slice1 10.3.20.111:6000 pid=176313: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

Sep 09 '22 09:09 dblife1024

Does anyone else have problems like mine?

Sep 14 '22 03:09 dblife1024

Does anyone pay attention to this bug？

Oct 14 '22 08:10 dblife1024

gpdb gpdb copied to clipboard

Segment Unexpected internal error: Segment process received signal SIGSEGV

Bug Report

Greenplum version or build

OS version and uname -a

autoconf options used ( config.status --config )

Installation information ( pg_config )

Expected behavior

Actual behavior

Step to reproduce the behavior

1、the segment error

1、The result of explain is:

2、the result of explain analyze is：

gpdb
gpdb copied to clipboard