gpdb
gpdb copied to clipboard
Segment Unexpected internal error: Segment process received signal SIGSEGV
Bug Report
Greenplum version or build
Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source
OS version and uname -a
Linux mdw 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 23 16:47:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
autoconf options used ( config.status --config )
Installation information ( pg_config )
postgresql.conf checkpoint_segments=8 gp_contentid=-1 gp_vmem_protect_limit=75018 work_mem=2048MB effective_cache_size=128GB maintenance_work_mem=4GB log_min_duration_statement=10s log_statement=none log_filename='gpdb-%a.log' log_truncate_on_rotation=on log_rotation_age=1d statement_mem=1024MB max_statement_mem=2048MB statement_timeout=1440s gp_enable_global_deadlock_detector=on gp_resqueue_priority_cpucores_per_segment=96 max_prepared_transactions=1000 max_connections=1000 gp_enable_query_metrics=on shared_preload_libraries='metrics_collector'
free -m total used free shared buff/cache available Mem: 772426 19227 679453 30139 73744 721221 Swap: 0 0 0
Expected behavior
Actual behavior
Step to reproduce the behavior
1:My table definition is
CREATE TABLE public.table1( id bigint NOT NULL, first_access_time bigint, access_time bigint, ..... .... ... ) DISTRIBUTED BY (id) PARTITION BY RANGE(first_access_time) ( PARTITION pn_46 START (1646683200000::bigint) END (1646769600000::bigint) EVERY (86400000::bigint) WITH (tablename='alarm_collection_1_prt_pn_46', appendonly='false'), PARTITION pn_47 START (1646769600000::bigint) END (1646856000000::bigint) EVERY (86400000::bigint) WITH (tablename='alarm_collection_1_prt_pn_47', appendonly='false'), .... ... ..(216 partition) DEFAULT PARTITION other WITH (tablename='alarm_collection_1_prt_other', appendonly='false') );
2:select count(*) from table1; count 61864 (1 row)
3:my sql is -- select count(*) from table1WHERE ((table1.group_ids && '{8}') OR (table1.group_ids && '{35}') OR (table1.group_ids && '{61}') OR (table1.group_ids && '{91}') OR (table1.group_ids && '{162}') OR (table1.group_ids && '{196}') OR (table1.group_ids && '{199}') OR ,......... ,......... ,......... (Tens of thousands category)
After the execution, segment becomes panic
4:the pg_log is
2022-09-08 13:04:39.208390 +04,,,p314615,th332970112,"10.3.206.110","26080",2022-09-08 13:04:39 +04,0,,,seg7,,,,,"LOG","00000","PID 313773 in cancel request did not match any process",,,,,,,0,,"postmaster.c",2724,
2022-09-08 13:04:39.555222 +04,"gpadmin",,p314695,th332970112,"10.3.206.110","26528",2022-09-08 13:04:39 +04,0,,,seg7,,,,,"LOG","00000","requesting fts retry as mirror didn't connect yet but in grace period: 3","pid zero at time: 0 accept connections start time: 1662627876",,,,,,0,,"gp_replication.c",535,
2022-09-08 13:04:40.068181 +04,,,p314639,th0,,,2022-09-08 13:04:39 +04,0,con272093585,cmd17,seg7,slice1,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 0x7f66115b9630 libpthread.so.0
Hi would you please offer the
- coredump
- minirepro (doc how to use it https://community.pivotal.io/s/article/How-to-Collect-DDL-and-Statistics-Information-Using-the-Minirepro-Utility?language=en_US)
I'm really, really sorry,I can't find the coredump,It seems that coredump has not been generated and I work in the security industry and have a certain confidentiality agreement. I can't send minirepro documents,Can you help me from the following error log?
1、the segment error
From the start time of the segment process,The segment process is not shutdown,but the segment log have some error log
2022-09-09 10:05:54.127932 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","received fast shutdown request",,,,,,,0,,"postmaster.c",3090,
2022-09-09 10:05:54.127966 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","aborting any active transactions",,,,,,,0,,"postmaster.c",3116,
2022-09-09 10:05:54.128163 +04,,,p271807,th-866776960,,,,0,,,seg30,,,,,"FATAL","57P01","terminating background worker ""sweeper process"" due to administrator command",,,,,,,0,,"bgworker.c",565,
2022-09-09 10:05:54.130843 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","worker process: sweeper process (PID 271807) exited with exit code 1",,,,,,,0,,"postmaster.c",3965,
2022-09-09 10:05:54.131206 +04,,,p271674,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","worker process: metrics collector (PID 271972) exited with exit code 1",,,,,,,0,,"postmaster.c",3965,
2022-09-09 10:05:54.734010 +04,,,p271803,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","shutting down",,,,,,,0,,"xlog.c",8433,
2022-09-09 10:05:54.777894 +04,,,p271803,th-866776960,,,,0,,,seg30,,,,,"LOG","00000","database system is shut down",,,,,,,0,,"xlog.c",8468,
2022-09-09 10:06:48.232477 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system was shut down at 2022-09-09 10:05:54 +04",,,,,,,0,,"xlog.c",6421,
2022-09-09 10:06:48.232588 +04,"gpadmin",,p177467,th1834031232,"10.3.206.113","49328",2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 0/0",,,,,,0,,"postmaster.c",2556,
2022-09-09 10:06:48.232601 +04,"gpadmin","postgres",p177469,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 0/0",,,,,,0,,"postmaster.c",2556,
2022-09-09 10:06:48.232611 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","end of transaction log location is 12/4B4BD5F0",,,,,,,0,,"xlog.c",7476,
2022-09-09 10:06:48.232620 +04,"gpadmin","postgres",p177469,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","08006","could not send data to client: Broken pipe",,,,,,,0,,"pqcomm.c",1586,
2022-09-09 10:06:48.233372 +04,"gpadmin","postgres",p177470,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 12/4B4BD570",,,,,,0,,"postmaster.c",2556,
2022-09-09 10:06:48.233831 +04,"gpadmin","postgres",p177468,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"FATAL","57P03","the database system is starting up","last replayed record at 12/4B4BD570",,,,,,0,,"postmaster.c",2556,
2022-09-09 10:06:48.233962 +04,"gpadmin","postgres",p177468,th1834031232,"[local]",,2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","08006","could not send data to client: Broken pipe",,,,,,,0,,"pqcomm.c",1586,
2022-09-09 10:06:48.234392 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","latest completed transaction id is 1051406 and next transaction id is 1051407",,,,,,,0,,"xlog.c",7764,
2022-09-09 10:06:48.234763 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","MultiXact member wraparound protections are now enabled",,,,,,,0,,"multixact.c",2619,
2022-09-09 10:06:48.234781 +04,,,p177466,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",7788,
2022-09-09 10:06:48.237422 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","starting background worker process ""sweeper process""",,,,,,,0,,"postmaster.c",6169,
2022-09-09 10:06:48.237669 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","PostgreSQL 9.4.26 (Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 9 2022 00:51:26",,,,,,,0,,"postmaster.c",3290,
2022-09-09 10:06:48.237687 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 9.4.26 (Greenplum Database 6.19.4 build commit:953778b47d418bb463e4abb2d982ba27dd281010 Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 9 2022 00:51:26",,,,,,0,,"postmaster.c",3294,
2022-09-09 10:06:48.439391 +04,"gpadmin",,p177534,th1834031232,"10.3.206.113","49342",2022-09-09 10:06:48 +04,0,,,seg30,,,,,"LOG","00000","standby ""gp_walreceiver"" is now the synchronous standby with priority 1",,,,,,,0,,"syncrep.c",578,
2022-09-09 10:07:01.610097 +04,,,p177432,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","registering background worker ""metrics collector""",,,,,,,0
2022-09-09 10:09:18.454108 +04,,,p177476,th1834031232,,,,0,,,seg30,,,,,"LOG","00000","activeWeight underflow!",,,,,,,0,,"backoff.c",934,
2022-09-09 10:09:23.319351 +04,,,p184638,th0,,,2022-09-09 10:09:22 +04,0,con212,cmd34,seg30,slice1,,,,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV",,,,,,,0,,,,"1 0x7fa36ad3a630 libpthread.so.0
### 2、the FTS error log 2022-09-09 10:09:44.823765 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode replayed (12/4B525C50) (content=30) primary dbid=57, mirror dbid=57",,,,,,,0,,"ftsprobe.c",259, 2022-09-09 10:09:44.823772 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:09:49.051857 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: segment (content=30, dbid=49, role=p) reported isMirrorUp 0, isInSync 0, isSyncRepEnabled 1, isRoleMirror 0, and retryRequested 1 to the prober.",,,,,,,0,,"ftsprobe.c",626, 2022-09-09 10:09:49.051900 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS max (5) retries exhausted (content=30, dbid=49) state=6",,,,,,,0,,"ftsprobe.c",770, 2022-09-09 10:09:49.051906 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS skipping mirror down update for (content=30) as retryRequested",,,,,,,0,,"ftsprobe.c",1023, and retryRequested 0 to the prober.",,,,,,,0,,"ftsprobe.c",626, 2022-09-09 10:10:18.023978 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=49) state=12, retry_count=0, conn->status=-1",,,,,,,0,,"ftsprobe.c",288, 2022-09-09 10:10:18.024220 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsSend (content=30, dbid=49) state=12, retry_count=0, conn->asyncStatus=-1",,,,,,,0,,"ftsprobe.c",532, 2022-09-09 10:11:02.961662 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=49) state=0, retry_count=5, conn->status=4",,,,,,,0,,"ftsprobe.c",288, 2022-09-09 10:11:02.961669 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode and not making progress (content=30) primary dbid=49, mirror dbid=57",,,,,,,0,,"ftsprobe.c",246, 2022-09-09 10:11:02.961677 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:11: 2022-09-09 10:11:02.961669 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: detected segment is in recovery mode and not making progress (content=30) primary dbid=49, mirror dbid=57",,,,,,,0,,"ftsprobe.c",246, 2022-09-09 10:11:02.961677 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: cannot establish libpq connection (content=30, dbid=49): FATAL: the database system is in recovery mode 2022-09-09 10:11:02.962122 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS max (5) retries exhausted (content=30, dbid=49) state=9",,,,,,,0,,"ftsprobe.c",770, 2022-09-09 10:11:02.963312 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS promoting mirror (content=30, dbid=57) to be the new primary",,,,,,,0,,"ftsprobe.c",1109, 2022-09-09 10:11:02.963546 +04,,,p164544,th943114368,,,,0,con2,,seg-1,,,,,"LOG","00000","FTS: ftsConnect (content=30, dbid=57) state=2, retry_count=0, conn->status=-1",,,,,,,0,,"ftsprobe.c",288,
Can you paste the plan that lead to PANIC here?
1、The result of explain is:
Limit (cost=0.00..6.00 rows=1 width=8) -> Aggregate (cost=0.00..6.00 rows=1 width=8) -> Gather Motion 32:1 (slice1; segments: 32) (cost=0.00..6.00 rows=1 width=1) -> Result (cost=0.00..6.00 rows=1 width=1) -> Sequence (cost=0.00..6.00 rows=1 width=1) -> Partition Selector for alarm_collection (dynamic scan id: 1) (cost=10.00..100.00 rows=4 width=4) Partitions selected: 2 (out of 213) -> Dynamic Index Scan on alarm_collection (dynamic scan id: 1) (cost=0.00..6.00 rows=1 width=1) Index Cond: ((first_access_time >= 1662667200000::bigint) AND (first_access_time <= 1662703761698::bigint) AND (access_time >= 1662703200000::bigint) AND (access_time <= 1662703 761698::bigint)) Filter: ((access_time >= 1662703200000::bigint) AND (access_time <= 1662703761698::bigint) AND (first_access_time >= 1662667200000::bigint) AND (first_access_time <= 16627037616 98::bigint) AND ((group_ids && '{11}'::integer[]) OR (group_ids && '{28}'::integer[]) OR (group_ids && '{33}'::integer[]) OR (group_ids && '{107}'::integer[]) OR (group_ids && '{10}'::integer[]) OR (group_ids & & '{183}'::integer[]) OR (group_ids && '{213}'::integer[]) OR (group_ids && '{218}'::integer[]) OR (group_ids && '{224}'::integer[]) OR (group_ids && '{281}'::integer[]) OR (group_ids && '{289}'::integer[]) OR (group_ids && '{407}'::integer[]) OR (group_ids && '{25}'::integer[])*************【300 parameters】OR (group_ids && '{412}'::integer[])) AND (vendor_id = 1) AND ((branch_id)::text = 'QAAwpAEiM'::text))
2、the result of explain analyze is:
ERROR: Error on receive from seg0 slice1 10.3.20.111:6000 pid=176313: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Does anyone else have problems like mine?
Does anyone pay attention to this bug?