gpdb
gpdb copied to clipboard
The segment status in gp_segment_configuration is not correct
Bug Report
Greenplum version or build
- 6.26.2
OS version and uname -a
autoconf options used ( config.status --config )
Installation information ( pg_config )
Expected behavior
[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_segment_configuration order by hostname, port;"
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+-------+----------+---------+----------------------------------------
1 | -1 | p | p | n | u | 5483 | gp6mdw | gp6mdw | /disk1/gpdata/gpmaster/gpseg-1
2 | 0 | p | p | s | u | 40000 | gp6sdw3 | gp6sdw3 | /disk1/gpdata/gpsegment/primary/gpseg0
3 | 1 | p | p | s | u | 40001 | gp6sdw3 | gp6sdw3 | /disk2/gpdata/gpsegment/primary/gpseg1
4 | 2 | p | p | s | u | 40002 | gp6sdw3 | gp6sdw3 | /disk3/gpdata/gpsegment/primary/gpseg2
11 | 3 | m | m | s | u | 50000 | gp6sdw3 | gp6sdw3 | /disk1/gpdata/gpsegment/mirror/gpseg3
12 | 4 | m | m | s | u | 50001 | gp6sdw3 | gp6sdw3 | /disk2/gpdata/gpsegment/mirror/gpseg4
13 | 5 | m | m | s | u | 50002 | gp6sdw3 | gp6sdw3 | /disk3/gpdata/gpsegment/mirror/gpseg5
5 | 3 | p | p | s | u | 40000 | gp6sdw4 | gp6sdw4 | /disk1/gpdata/gpsegment/primary/gpseg3
6 | 4 | p | p | s | u | 40001 | gp6sdw4 | gp6sdw4 | /disk2/gpdata/gpsegment/primary/gpseg4
7 | 5 | p | p | s | u | 40002 | gp6sdw4 | gp6sdw4 | /disk3/gpdata/gpsegment/primary/gpseg5
8 | 0 | m | m | s | u | 50000 | gp6sdw4 | gp6sdw4 | /disk1/gpdata/gpsegment/mirror/gpseg0
9 | 1 | m | m | s | u | 50001 | gp6sdw4 | gp6sdw4 | /disk2/gpdata/gpsegment/mirror/gpseg1
10 | 2 | m | m | s | d | 50002 | gp6sdw4 | gp6sdw4 | /disk3/gpdata/gpsegment/mirror/gpseg2
(13 rows)
Actual behavior
The node status in gp_segment_configuration is inconsistent with the node status displayed by gpstate -e.
The problematic segment got stuck after started streaming WAL from primary at 0/80000000 on timeline 1, and it is not correctly listening on port 50002. However, the specific reason is unknown.
[gpadmin@gp6mdw greenplum]$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.191.7.13 gp6mdw
10.191.7.16 gp6sdw3
10.191.7.17 gp6sdw4
[gpadmin@gp6mdw greenplum]$ gpstate -b
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Starting gpstate with args: -b
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79'
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 20 2024 08:06:22'
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Gathering data from segments...
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Greenplum instance status summary
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Master instance = Active
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Master standby = No master standby configured
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total segment instance count from metadata = 12
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Primary Segment Status
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total primary segments = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total primary segment valid (at master) = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total primary segment failures (at master) = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid files found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of /tmp lock files found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number postmaster processes found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Mirror Segment Status
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total mirror segments = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total mirror segment valid (at master) = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid files found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number of /tmp lock files found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number postmaster processes found = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number mirror segments acting as primary segments = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@gp6mdw greenplum]$ gpstate -e
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79'
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 20 2024 08:06:22'
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Gathering data from segments...
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Downed Segments (may include segments where status could not be retrieved)
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:- Segment Port Config status Status
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:- gp6sdw4 50002 Up Down
[gpadmin@gp6mdw greenplum]$ telnet gp6sdw4 50002
Trying 10.191.7.17...
telnet: Unable to connect to remote host: Connection refused
[gpadmin@gp6mdw greenplum]$ ssh gp6sdw4
Last login: Thu Feb 1 00:24:43 2024 from gp6mdw
[gpadmin@gp6sdw4 ~]$ ps -ef | grep 50002
gpadmin 96 1 0 Jan31 ? 00:00:00 /usr/local/greenplum-db-6.26.2/bin/postgres -D /disk3/gpdata/gpsegment/mirror/gpseg2 -p 50002
gpadmin 103 96 0 Jan31 ? 00:00:00 postgres: 50002, logger process
gpadmin 104 96 0 Jan31 ? 00:00:02 postgres: 50002, startup process recovering 000000010000000000000020
gpadmin 140 96 0 Jan31 ? 00:00:00 postgres: 50002, checkpointer process
gpadmin 142 96 0 Jan31 ? 00:00:00 postgres: 50002, writer process
gpadmin 267 96 0 Jan31 ? 00:00:08 postgres: 50002, wal receiver process streaming 0/806D1090
gpadmin 6556 98 0 Jan31 ? 00:00:01 postgres: 40001, gpadmin postgres 10.191.7.13(50002) con548 seg4 idle
gpadmin 24154 24125 0 00:29 pts/2 00:00:00 grep --color=auto 50002
[gpadmin@gp6sdw4 greenplum]$ netstat -nltp | grep 50002
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::50002 :::* LISTEN 96/postgres
[gpadmin@gp6sdw4 ~]$ exit
logout
Connection to gp6sdw4 closed.
[gpadmin@gp6mdw greenplum]$
[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_segment_configuration order by hostname, port;"
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+-------+----------+---------+----------------------------------------
1 | -1 | p | p | n | u | 5483 | gp6mdw | gp6mdw | /disk1/gpdata/gpmaster/gpseg-1
2 | 0 | p | p | s | u | 40000 | gp6sdw3 | gp6sdw3 | /disk1/gpdata/gpsegment/primary/gpseg0
3 | 1 | p | p | s | u | 40001 | gp6sdw3 | gp6sdw3 | /disk2/gpdata/gpsegment/primary/gpseg1
4 | 2 | p | p | s | u | 40002 | gp6sdw3 | gp6sdw3 | /disk3/gpdata/gpsegment/primary/gpseg2
11 | 3 | m | m | s | u | 50000 | gp6sdw3 | gp6sdw3 | /disk1/gpdata/gpsegment/mirror/gpseg3
12 | 4 | m | m | s | u | 50001 | gp6sdw3 | gp6sdw3 | /disk2/gpdata/gpsegment/mirror/gpseg4
13 | 5 | m | m | s | u | 50002 | gp6sdw3 | gp6sdw3 | /disk3/gpdata/gpsegment/mirror/gpseg5
5 | 3 | p | p | s | u | 40000 | gp6sdw4 | gp6sdw4 | /disk1/gpdata/gpsegment/primary/gpseg3
6 | 4 | p | p | s | u | 40001 | gp6sdw4 | gp6sdw4 | /disk2/gpdata/gpsegment/primary/gpseg4
7 | 5 | p | p | s | u | 40002 | gp6sdw4 | gp6sdw4 | /disk3/gpdata/gpsegment/primary/gpseg5
8 | 0 | m | m | s | u | 50000 | gp6sdw4 | gp6sdw4 | /disk1/gpdata/gpsegment/mirror/gpseg0
9 | 1 | m | m | s | u | 50001 | gp6sdw4 | gp6sdw4 | /disk2/gpdata/gpsegment/mirror/gpseg1
10 | 2 | m | m | s | u | 50002 | gp6sdw4 | gp6sdw4 | /disk3/gpdata/gpsegment/mirror/gpseg2
(13 rows)
[gpadmin@gp6mdw greenplum]$
[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_stat_replication order by gp_Segment_id;"
gp_segment_id | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state | sync_error
---------------+-----+----------+---------+------------------+-------------+-----------------+-------------+------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------+------------
0 | 195 | 10 | gpadmin | gp_walreceiver | 10.191.7.17 | | 60234 | 2024-01-31 19:18:50.937193+08 | | streaming | 0/8062D8B0 | 0/8062D8B0 | 0/8062D8B0 | 0/8062D8B0 | 1 | sync | none
1 | 214 | 10 | gpadmin | gp_walreceiver | 10.191.7.17 | | 50022 | 2024-01-31 19:18:55.503663+08 | | streaming | 0/8044FF50 | 0/8044FF50 | 0/8044FF50 | 0/8044FF50 | 1 | sync | none
2 | 216 | 10 | gpadmin | gp_walreceiver | 10.191.7.17 | | 65492 | 2024-01-31 19:18:55.746114+08 | | streaming | 0/806D5CB0 | 0/806D5CB0 | 0/806D5CB0 | 0/806D5CB0 | 1 | sync | none
3 | 263 | 10 | gpadmin | gp_walreceiver | 10.191.7.16 | | 30002 | 2024-01-31 19:18:55.535799+08 | | streaming | 0/80510300 | 0/80510300 | 0/80510300 | 0/80510300 | 1 | sync | none
4 | 266 | 10 | gpadmin | gp_walreceiver | 10.191.7.16 | | 49112 | 2024-01-31 19:18:55.714957+08 | | streaming | 0/8272BC28 | 0/8272BC28 | 0/8272BC28 | 0/8272BC28 | 1 | sync | none
5 | 264 | 10 | gpadmin | gp_walreceiver | 10.191.7.16 | | 51718 | 2024-01-31 19:18:55.550176+08 | | streaming | 0/80562E30 | 0/80562E30 | 0/80562E30 | 0/80562E30 | 1 | sync | none
(6 rows)