Pinot Server fails to start in offline/air-gapped environments due to eager evaluation of NetUtils.getHostAddress()
Current Behavior
When attempting to start the Pinot Server (version 1.4.0) in an environment with no outbound internet access (e.g., a firewalled Docker Swarm overlay network), the server fails to initialize.
The startup process throws a java.io.UncheckedIOException: java.net.SocketException: Network is unreachable. This occurs because NetUtils.getHostAddress() is called, which attempts to open a socket to an external address (which is now 8.8.8.8) to determine the host's IP, failing in an offline environment.
Steps to Reproduce
Configure a Pinot Server (v1.4.0) in an environment that has no access to the public internet (e.g., a Docker container with networking restricted, or a fully air-gapped VM).
Attempt to start the server using the standard StartServer command. Note that the failure occurs even when explicitly providing the -serverHost parameter (which is expected to set pinot.server.netty.host and bypass the network lookup)
The server process will fail during initialization.
Observe the stack trace (see below).
Error Log / Stack Trace
The following exception is thrown during startup:
2025/11/05 10:35:20.073 INFO [StartServerCommand] [main] Executing command: StartServer -clusterName [my-cluster] -serverHost pinot-server-node1 -serverPort 8098 -serverAdminPort 8097 -serverGrpcPort 8090 -serverMultistageServerPort 0 -serverMultistageRunnerPort 0 -dataDir /var/lib/pinot/server -segmentDir /var/lib/pinot/segments -zkAddress zookeeper1:2181
2025/11/05 10:35:20.078 INFO [StartServiceManagerCommand] [main] Executing command: StartServiceManager -clusterName [my-cluster] -zkAddress zookeeper1:2181 -port -1 -bootstrapServices []
2025/11/05 10:35:20.079 INFO [StartServiceManagerCommand] [main] Starting a Pinot [SERVICE_MANAGER] at 0.217s since launch
2025/11/05 10:35:20.081 INFO [StartServiceManagerCommand] [main] Started Pinot [SERVICE_MANAGER] instance [ServiceManager_pinot-server-node1_-1] at 0.219s since launch
2025/11/05 10:35:20.082 INFO [StartServiceManagerCommand] [Start a Pinot [SERVER]] Starting a Pinot [SERVER] at 0.22s since launch
2025/11/05 10:35:20.354 ERROR [StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot [SERVER] at 0.493 since launch
java.io.UncheckedIOException: java.net.SocketException: Network is unreachable
at java.base/sun.nio.ch.DatagramSocketAdaptor.connect(DatagramSocketAdaptor.java:120)
at java.base/java.net.DatagramSocket.connect(DatagramSocket.java:474)
at org.apache.pinot.spi.utils.NetUtils.getHostAddress(NetUtils.java:62)
at org.apache.pinot.server.starter.helix.BaseServerStarter.init(BaseServerStarter.java:198)
at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:166)
at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:97)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:267)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:293)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:267)
Caused by: java.net.SocketException: Network is unreachable
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:579)
at java.base/sun.nio.ch.DatagramChannelImpl.connect(DatagramChannelImpl.java:1249)
at java.base/sun.nio.ch.DatagramSocketAdaptor.connectInternal(DatagramSocketAdaptor.java:91)
at java.base/sun.nio.ch.DatagramSocketAdaptor.connect(DatagramSocketAdaptor.java:118)
... 8 more
Analysis & Root Cause
The root cause is in org.apache.pinot.server.starter.helix.BaseServerStarter.init (line 198 in version 1.4.0, or line 205 on master):
_hostname = _serverConf.getProperty(Helix.KEY_OF_SERVER_NETTY_HOST,
_serverConf.getProperty(Helix.SET_INSTANCE_ID_TO_HOSTNAME_KEY, false) ? NetUtils.getHostnameOrAddress()
: NetUtils.getHostAddress());
The issue is that the second argument (the default value) to _serverConf.getProperty() is evaluated eagerly.
This means that NetUtils.getHostAddress() (or NetUtils.getHostnameOrAddress()) is always called, even if the user has explicitly provided a value for pinot.server.netty.host (Helix.KEY_OF_SERVER_NETTY_HOST) in their configuration to avoid this network lookup.
In an offline environment, this eager call to NetUtils.getHostAddress() fails with Network is unreachable, preventing the server from starting.
Proposed Solution
I suggest a two-part solution:
-
Primary Fix (Lazy Evaluation): Refactor the logic in BaseServerStarter.init to only call NetUtils if Helix.KEY_OF_SERVER_NETTY_HOST is not already defined in the configuration.
-
Robustness Fix (NetUtils): It would also be beneficial to make NetUtils.getHostAddress() more robust. Instead of throwing an exception if the default probe address (e.g., 8.8.8.8) is unreachable, it could log a warning and fall back to the first available non-loopback IP address.
-
(Addition): The probe address(es) should be customizable.
Environment
- Pinot Version: 1.4.0
- Java Version: Amazon Corretto 17
- Deployment: Docker Swarm (on an internal overlay network)
- OS: Ubuntu 24.04
We have identified a temporary workaround for this issue.
By setting the configuration pinot.set.instance.id.to.hostname=true, the logic in BaseServerStarter.init is forced to take the other branch of the ternary operator, calling NetUtils.getHostnameOrAddress() instead of NetUtils.getHostAddress().
The getHostnameOrAddress() method successfully resolves the host's name without making an external network call, thus bypassing the SocketException and allowing the server to start in an offline environment.
Note: This highlights the core problem. Due to the eager evaluation, a NetUtils method is still called unnecessarily, even when the Helix.KEY_OF_SERVER_NETTY_HOST (via -serverHost) is explicitly set. However, this path at least avoids the crash.
Thanks for reporting this. I like that try and fallback solution.