gpdb
gpdb copied to clipboard
Address already in use (bind errno 98)
Greenplum version or build
4.3 & 6X_STABLE
OS version and uname -a
Linux x86_64 GNU/Linux
autoconf options used ( config.status --config )
Installation information ( pg_config )
Expected behavior
Actual behavior
LOG: (58M01) Master unable to connect to seg2 xxxx:3176 with options : FATAL: Interconnect Error: Could not set up tcp listener socket.
DETAIL: Address already in use (bind errno 98)
Step to reproduce the behavior
I think this function setupTCPListeningSocket should set SO_REUSEADDR option with socket before "bind".

Is that mean that the port 3176 is already in use? Can we find out who is it?
Do we have the same problem in setupUDPListeningSocket()?
By the way, what exactly is the problem here? Is it a transient problem, does the error stop occurring after some time?
Do we have the same problem in
setupUDPListeningSocket()?By the way, what exactly is the problem here? Is it a transient problem, does the error stop occurring after some time?
It is a problem with probability. See the function StreamServerPort to get more details.
Without the SO_REUSEADDR flag, a new postmaster can't be started
right away after a stop or crash, giving "address already in use"
error on TCP ports.
Is that mean that the port 3176 is already in use? Can we find out who is it?
The socket with port 3176 is in TIME_WAIT state. When there are a lot of TCP connections generated at the same time, there will be probability of TCP port number reuse, and some of these port may be in the TIME_WAIT state. Without the SO_REUSEADDR flag, we will get "address already in use" error on TCP ports.
I see, thank you for the details. UDP connections won't suffer from this issue, setupUDPListeningSocket() doesn't need any fix. Am I right?
I see, thank you for the details. UDP connections won't suffer from this issue,
setupUDPListeningSocket()doesn't need any fix. Am I right?
yeah
Do we have the same problem in
setupUDPListeningSocket()?By the way, what exactly is the problem here? Is it a transient problem, does the error stop occurring after some time?
Interconnect/UDP could be even worse. All sockets for interconnect/UDP are bound to *:<port> and the sockets for interconnect/TCP are bound to a unicast IP address. It means all sockets for interconnect/UDP share the same port space.
@constzl Could you show me the content in gp_segment_configuration?
select dbid, hostname, address from gp_segment_configuration