gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

Recovering operation of gprecoverseg should refers to pg_catalog.gp_segment_configuration.address rather than the server hosname

Open CC-Hsu opened this issue 5 years ago • 5 comments

Greenplum version or build

GPDB v6.x

OS version and uname -a

RHEL/CentOS 7.x

autoconf options used ( config.status --config )

GPDB Enterprise Edition

Installation information ( pg_config )

GPDB Enterprise Edition

Expected behavior

Operations inside gprecoverseg (pg_rewind or pg_basebackup) should follow server address information stored in pg_catalog.gp_segment_configuration.

Actual behavior

gprecoverseg performs restore operations with the servers' hostnames.

Step to reproduce the behavior

Crash any segment instance then perform gprecoverseg operation to repair the cluster.

Description

According to the GPDB installation guide, usually we initialize GPDB clusters as follows.

  1. One external NIC and one interconnect NIC, both are setup with fault-tolerant network bonding.
  2. For every server we place the same content in /etc/hosts like below to ensure (a) hostnames are align with external network, while (b) interconnect NICs are labeled with conventional GPDB naming rule.
[gpadmin@mdwee ~]$ gpssh -f ~/allhosts -e 'hostname'
[sdw1ee] hostname
[sdw1ee] sdw1ee
[ mdwee] hostname
[ mdwee] mdwee
[sdw2ee] hostname
[sdw2ee] sdw2ee
[gpadmin@mdwee ~]$ 
[gpadmin@mdwee ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# External NICs, 1Gbs network
192.168.1.10   mdwee
192.168.1.11   sdw1ee
192.168.1.12   sdw2ee

# Interconenct NICs, 10Gbs network
10.18.0.185   mdw
10.18.0.150   sdw1
10.18.0.182   sdw2
[gpadmin@mdwee ~]$ 
  1. Filling hostfile_gpinitsystem with sdw* labels.
[gpadmin@mdwee ~]$ cat ~/gpconfigs/hostfile_gpinitsystem
sdw1
sdw2
[gpadmin@mdwee ~]$ 

And we get a GPDB cluster with configurations similar to below output, which stores servers' hostnames to pg_catalog.gp_segment_configuration.hostname and conventional GPDB server labels to pg_catalog.gp_segment_configuration.address. And addresses are resolved via /etc/hosts file.

postgres=# select * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |       datadir        
------+---------+------+----------------+------+--------+------+----------+---------+----------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | mdwee    | mdw     | /data/master/gpseg-1
    3 |       1 | p    | p              | s    | u      | 6000 | sdw2ee   | sdw2    | /data/primary/gpseg1
    5 |       1 | m    | m              | s    | u      | 7000 | sdw1ee   | sdw1    | /data/mirror/gpseg1
    2 |       0 | p    | p              | s    | u      | 6000 | sdw1ee   | sdw1    | /data/primary/gpseg0
    4 |       0 | m    | m              | s    | u      | 7000 | sdw2ee   | sdw2    | /data/mirror/gpseg0
(5 rows)

postgres=# 

However, recently we find that when performing gprecoverseg operation, gprecoverseg refers to the hostnames of affected segment hosts rather than address listed in gp_segment_configuration.

This causes external NICs busy and gprecoverseg operations to fail due to slow network speed on external LAN.

Can GPDB team improve gprecoverseg so that it can refers to the address column of pg_catalog.gp_segment_configuration rather than server's hostname?

By the way, if there is any other GPDB tools that does not refers to the address column, hope you can improve them in the same way.

Best Regards.

CC-Hsu avatar Nov 01 '20 05:11 CC-Hsu

Just for reference this topic was also discussed in mailing list thread.

ashwinstar avatar Nov 03 '20 00:11 ashwinstar

Thanks for letting us know! We have added this to our list of work and will review the reproduction. And take a look at the mailing list thread Ashwin linked to and this thread: Hostname and Address Semantics in GPDB

kalensk avatar Nov 03 '20 00:11 kalensk

Also, this issue is duplicate of https://github.com/greenplum-db/gpdb/issues/9060

ashwinstar avatar Nov 03 '20 00:11 ashwinstar

For long term, just throwing out there maybe need to rename those 2 columns in gp_segment_configuration to something which clearly communicate the purpose and usage. Specially address column is very confusing name.

ashwinstar avatar Nov 03 '20 00:11 ashwinstar

clearly communicate the purpose and usage

I think Greenplum (or maybe just me) still needs to clarify and agree on the semantics of hostname and address. And yes a rename may be helpful once we get clarity on what we want the semantics to even be.

kalensk avatar Nov 03 '20 00:11 kalensk