gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

The segment status in gp_segment_configuration is not correct

Open cobolbaby opened this issue 1 year ago • 2 comments

Bug Report

Greenplum version or build

  • 6.26.2

OS version and uname -a

autoconf options used ( config.status --config )

Installation information ( pg_config )

Expected behavior

[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_segment_configuration order by hostname, port;"
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                datadir
------+---------+------+----------------+------+--------+-------+----------+---------+----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  5483 | gp6mdw   | gp6mdw  | /disk1/gpdata/gpmaster/gpseg-1
    2 |       0 | p    | p              | s    | u      | 40000 | gp6sdw3  | gp6sdw3 | /disk1/gpdata/gpsegment/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 40001 | gp6sdw3  | gp6sdw3 | /disk2/gpdata/gpsegment/primary/gpseg1
    4 |       2 | p    | p              | s    | u      | 40002 | gp6sdw3  | gp6sdw3 | /disk3/gpdata/gpsegment/primary/gpseg2
   11 |       3 | m    | m              | s    | u      | 50000 | gp6sdw3  | gp6sdw3 | /disk1/gpdata/gpsegment/mirror/gpseg3
   12 |       4 | m    | m              | s    | u      | 50001 | gp6sdw3  | gp6sdw3 | /disk2/gpdata/gpsegment/mirror/gpseg4
   13 |       5 | m    | m              | s    | u      | 50002 | gp6sdw3  | gp6sdw3 | /disk3/gpdata/gpsegment/mirror/gpseg5
    5 |       3 | p    | p              | s    | u      | 40000 | gp6sdw4  | gp6sdw4 | /disk1/gpdata/gpsegment/primary/gpseg3
    6 |       4 | p    | p              | s    | u      | 40001 | gp6sdw4  | gp6sdw4 | /disk2/gpdata/gpsegment/primary/gpseg4
    7 |       5 | p    | p              | s    | u      | 40002 | gp6sdw4  | gp6sdw4 | /disk3/gpdata/gpsegment/primary/gpseg5
    8 |       0 | m    | m              | s    | u      | 50000 | gp6sdw4  | gp6sdw4 | /disk1/gpdata/gpsegment/mirror/gpseg0
    9 |       1 | m    | m              | s    | u      | 50001 | gp6sdw4  | gp6sdw4 | /disk2/gpdata/gpsegment/mirror/gpseg1
   10 |       2 | m    | m              | s    | d      | 50002 | gp6sdw4  | gp6sdw4 | /disk3/gpdata/gpsegment/mirror/gpseg2
(13 rows)

Actual behavior

The node status in gp_segment_configuration is inconsistent with the node status displayed by gpstate -e.

The problematic segment got stuck after started streaming WAL from primary at 0/80000000 on timeline 1, and it is not correctly listening on port 50002. However, the specific reason is unknown.

[gpadmin@gp6mdw greenplum]$ cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
10.191.7.13	gp6mdw
10.191.7.16	gp6sdw3
10.191.7.17	gp6sdw4
[gpadmin@gp6mdw greenplum]$ gpstate -b
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Starting gpstate with args: -b
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79'
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 20 2024 08:06:22'
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20240201:00:28:40:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Gathering data from segments...
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-Greenplum instance status summary
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Master instance                                           = Active
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Master standby                                            = No master standby configured
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total segment instance count from metadata                = 12
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Primary Segment Status
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total primary segments                                    = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total primary segment failures (at master)                = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number postmaster processes found                   = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Mirror Segment Status
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total mirror segments                                     = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total mirror segment failures (at master)                 = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number postmaster processes found                   = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 6
20240201:00:28:41:017961 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@gp6mdw greenplum]$ gpstate -e
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79'
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jan 20 2024 08:06:22'
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20240201:00:28:48:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Gathering data from segments...
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-----------------------------------------------------
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-Downed Segments (may include segments where status could not be retrieved)
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-   Segment   Port    Config status   Status
20240201:00:28:49:018019 gpstate:gp6mdw:gpadmin-[INFO]:-   gp6sdw4   50002   Up              Down

[gpadmin@gp6mdw greenplum]$ telnet gp6sdw4 50002
Trying 10.191.7.17...
telnet: Unable to connect to remote host: Connection refused
[gpadmin@gp6mdw greenplum]$ ssh gp6sdw4
Last login: Thu Feb  1 00:24:43 2024 from gp6mdw
[gpadmin@gp6sdw4 ~]$ ps -ef | grep 50002
gpadmin       96       1  0 Jan31 ?        00:00:00 /usr/local/greenplum-db-6.26.2/bin/postgres -D /disk3/gpdata/gpsegment/mirror/gpseg2 -p 50002
gpadmin      103      96  0 Jan31 ?        00:00:00 postgres: 50002, logger process
gpadmin      104      96  0 Jan31 ?        00:00:02 postgres: 50002, startup process   recovering 000000010000000000000020
gpadmin      140      96  0 Jan31 ?        00:00:00 postgres: 50002, checkpointer process
gpadmin      142      96  0 Jan31 ?        00:00:00 postgres: 50002, writer process
gpadmin      267      96  0 Jan31 ?        00:00:08 postgres: 50002, wal receiver process   streaming 0/806D1090
gpadmin     6556      98  0 Jan31 ?        00:00:01 postgres: 40001, gpadmin postgres 10.191.7.13(50002) con548 seg4 idle
gpadmin    24154   24125  0 00:29 pts/2    00:00:00 grep --color=auto 50002
[gpadmin@gp6sdw4 greenplum]$ netstat -nltp | grep 50002
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp6       0      0 :::50002                :::*                    LISTEN      96/postgres
[gpadmin@gp6sdw4 ~]$ exit
logout
Connection to gp6sdw4 closed.
[gpadmin@gp6mdw greenplum]$
[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_segment_configuration order by hostname, port;"
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                datadir
------+---------+------+----------------+------+--------+-------+----------+---------+----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  5483 | gp6mdw   | gp6mdw  | /disk1/gpdata/gpmaster/gpseg-1
    2 |       0 | p    | p              | s    | u      | 40000 | gp6sdw3  | gp6sdw3 | /disk1/gpdata/gpsegment/primary/gpseg0
    3 |       1 | p    | p              | s    | u      | 40001 | gp6sdw3  | gp6sdw3 | /disk2/gpdata/gpsegment/primary/gpseg1
    4 |       2 | p    | p              | s    | u      | 40002 | gp6sdw3  | gp6sdw3 | /disk3/gpdata/gpsegment/primary/gpseg2
   11 |       3 | m    | m              | s    | u      | 50000 | gp6sdw3  | gp6sdw3 | /disk1/gpdata/gpsegment/mirror/gpseg3
   12 |       4 | m    | m              | s    | u      | 50001 | gp6sdw3  | gp6sdw3 | /disk2/gpdata/gpsegment/mirror/gpseg4
   13 |       5 | m    | m              | s    | u      | 50002 | gp6sdw3  | gp6sdw3 | /disk3/gpdata/gpsegment/mirror/gpseg5
    5 |       3 | p    | p              | s    | u      | 40000 | gp6sdw4  | gp6sdw4 | /disk1/gpdata/gpsegment/primary/gpseg3
    6 |       4 | p    | p              | s    | u      | 40001 | gp6sdw4  | gp6sdw4 | /disk2/gpdata/gpsegment/primary/gpseg4
    7 |       5 | p    | p              | s    | u      | 40002 | gp6sdw4  | gp6sdw4 | /disk3/gpdata/gpsegment/primary/gpseg5
    8 |       0 | m    | m              | s    | u      | 50000 | gp6sdw4  | gp6sdw4 | /disk1/gpdata/gpsegment/mirror/gpseg0
    9 |       1 | m    | m              | s    | u      | 50001 | gp6sdw4  | gp6sdw4 | /disk2/gpdata/gpsegment/mirror/gpseg1
   10 |       2 | m    | m              | s    | u      | 50002 | gp6sdw4  | gp6sdw4 | /disk3/gpdata/gpsegment/mirror/gpseg2
(13 rows)

[gpadmin@gp6mdw greenplum]$
[gpadmin@gp6mdw greenplum]$ psql postgres -c "select * from gp_stat_replication order by gp_Segment_id;"
 gp_segment_id | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |        backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state | sync_error
---------------+-----+----------+---------+------------------+-------------+-----------------+-------------+------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------+------------
             0 | 195 |       10 | gpadmin | gp_walreceiver   | 10.191.7.17 |                 |       60234 | 2024-01-31 19:18:50.937193+08 |             | streaming | 0/8062D8B0    | 0/8062D8B0     | 0/8062D8B0     | 0/8062D8B0      |             1 | sync       | none
             1 | 214 |       10 | gpadmin | gp_walreceiver   | 10.191.7.17 |                 |       50022 | 2024-01-31 19:18:55.503663+08 |             | streaming | 0/8044FF50    | 0/8044FF50     | 0/8044FF50     | 0/8044FF50      |             1 | sync       | none
             2 | 216 |       10 | gpadmin | gp_walreceiver   | 10.191.7.17 |                 |       65492 | 2024-01-31 19:18:55.746114+08 |             | streaming | 0/806D5CB0    | 0/806D5CB0     | 0/806D5CB0     | 0/806D5CB0      |             1 | sync       | none
             3 | 263 |       10 | gpadmin | gp_walreceiver   | 10.191.7.16 |                 |       30002 | 2024-01-31 19:18:55.535799+08 |             | streaming | 0/80510300    | 0/80510300     | 0/80510300     | 0/80510300      |             1 | sync       | none
             4 | 266 |       10 | gpadmin | gp_walreceiver   | 10.191.7.16 |                 |       49112 | 2024-01-31 19:18:55.714957+08 |             | streaming | 0/8272BC28    | 0/8272BC28     | 0/8272BC28     | 0/8272BC28      |             1 | sync       | none
             5 | 264 |       10 | gpadmin | gp_walreceiver   | 10.191.7.16 |                 |       51718 | 2024-01-31 19:18:55.550176+08 |             | streaming | 0/80562E30    | 0/80562E30     | 0/80562E30     | 0/80562E30      |             1 | sync       | none
(6 rows)

Step to reproduce the behavior

cobolbaby avatar Jan 31 '24 16:01 cobolbaby