gpdb
gpdb copied to clipboard
Recovering operation of gprecoverseg should refers to pg_catalog.gp_segment_configuration.address rather than the server hosname
Greenplum version or build
GPDB v6.x
OS version and uname -a
RHEL/CentOS 7.x
autoconf options used ( config.status --config )
GPDB Enterprise Edition
Installation information ( pg_config )
GPDB Enterprise Edition
Expected behavior
Operations inside gprecoverseg (pg_rewind or pg_basebackup) should follow server address information stored in pg_catalog.gp_segment_configuration.
Actual behavior
gprecoverseg performs restore operations with the servers' hostnames.
Step to reproduce the behavior
Crash any segment instance then perform gprecoverseg operation to repair the cluster.
Description
According to the GPDB installation guide, usually we initialize GPDB clusters as follows.
- One external NIC and one interconnect NIC, both are setup with fault-tolerant network bonding.
- For every server we place the same content in
/etc/hostslike below to ensure (a) hostnames are align with external network, while (b) interconnect NICs are labeled with conventional GPDB naming rule.
[gpadmin@mdwee ~]$ gpssh -f ~/allhosts -e 'hostname'
[sdw1ee] hostname
[sdw1ee] sdw1ee
[ mdwee] hostname
[ mdwee] mdwee
[sdw2ee] hostname
[sdw2ee] sdw2ee
[gpadmin@mdwee ~]$
[gpadmin@mdwee ~]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# External NICs, 1Gbs network
192.168.1.10 mdwee
192.168.1.11 sdw1ee
192.168.1.12 sdw2ee
# Interconenct NICs, 10Gbs network
10.18.0.185 mdw
10.18.0.150 sdw1
10.18.0.182 sdw2
[gpadmin@mdwee ~]$
- Filling hostfile_gpinitsystem with sdw* labels.
[gpadmin@mdwee ~]$ cat ~/gpconfigs/hostfile_gpinitsystem
sdw1
sdw2
[gpadmin@mdwee ~]$
And we get a GPDB cluster with configurations similar to below output,
which stores servers' hostnames to pg_catalog.gp_segment_configuration.hostname
and conventional GPDB server labels to pg_catalog.gp_segment_configuration.address.
And addresses are resolved via /etc/hosts file.
postgres=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+------+----------+---------+----------------------
1 | -1 | p | p | n | u | 5432 | mdwee | mdw | /data/master/gpseg-1
3 | 1 | p | p | s | u | 6000 | sdw2ee | sdw2 | /data/primary/gpseg1
5 | 1 | m | m | s | u | 7000 | sdw1ee | sdw1 | /data/mirror/gpseg1
2 | 0 | p | p | s | u | 6000 | sdw1ee | sdw1 | /data/primary/gpseg0
4 | 0 | m | m | s | u | 7000 | sdw2ee | sdw2 | /data/mirror/gpseg0
(5 rows)
postgres=#
However, recently we find that when performing gprecoverseg operation,
gprecoverseg refers to the hostnames of affected segment hosts rather than address listed in gp_segment_configuration.
This causes external NICs busy and gprecoverseg operations to fail due to slow network speed on external LAN.
Can GPDB team improve gprecoverseg so that it can refers to the address column of pg_catalog.gp_segment_configuration rather than server's hostname?
By the way, if there is any other GPDB tools that does not refers to the address column, hope you can improve them in the same way.
Best Regards.
Just for reference this topic was also discussed in mailing list thread.
Thanks for letting us know! We have added this to our list of work and will review the reproduction. And take a look at the mailing list thread Ashwin linked to and this thread: Hostname and Address Semantics in GPDB
Also, this issue is duplicate of https://github.com/greenplum-db/gpdb/issues/9060
For long term, just throwing out there maybe need to rename those 2 columns in gp_segment_configuration to something which clearly communicate the purpose and usage. Specially address column is very confusing name.
clearly communicate the purpose and usage
I think Greenplum (or maybe just me) still needs to clarify and agree on the semantics of hostname and address. And yes a rename may be helpful once we get clarity on what we want the semantics to even be.