druid icon indicating copy to clipboard operation
druid copied to clipboard

Druid 33.0.0: coordinators not reachable when using druid-kubernetes-extensions

Open christian-schlichtherle opened this issue 8 months ago • 9 comments

Affected Version

Apache Druid 33.0.0

Description

When upgrading or deploying a new Druid cluster with the druid-kubernetes-extension, the broker, historical and router nodes cannot talk to the coordinator anymore. The coordinator itself does not log any errors or exceptions.

Exception from broker:

2025-04-29T19:05:51,152 WARN [FilteredHttpServerInventoryView-2] org.jboss.netty.channel.SimpleChannelUpstreamHandler - EXCEPTION, please implement org.jboss.netty.handler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handli
java.nio.channels.UnresolvedAddressException: null                                                                                                                                                                                                
    at java.base/sun.nio.ch.Net.checkAddress(Net.java:149) ~[?:?]                                                                                                                                                                                 
    at java.base/sun.nio.ch.Net.checkAddress(Net.java:157) ~[?:?]                                                                                                                                                                                 
    at java.base/sun.nio.ch.SocketChannelImpl.checkRemote(SocketChannelImpl.java:816) ~[?:?]                                                                                                                                                      
    at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:839) ~[?:?]                                                                                                                                                          
    at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108) ~[netty-3.10.6.Final.jar:?]                                                                                                   
    at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70) ~[netty-3.10.6.Final.jar:?]                                                                                                  
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) ~[netty-3.10.6.Final.jar:?]                                                                                    
    at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.handler.codec.http.HttpClientCodec.handleDownstream(HttpClientCodec.java:97) ~[netty-3.10.6.Final.jar:?]                                                                                                                   
    at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.channel.Channels.connect(Channels.java:634) ~[netty-3.10.6.Final.jar:?]                                                                                                                                                    
    at org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:215) ~[netty-3.10.6.Final.jar:?]                                                                                                                                      
    at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229) ~[netty-3.10.6.Final.jar:?]                                                                                                                                    
    at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182) ~[netty-3.10.6.Final.jar:?]                                                                                                                                    
    at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory.generate(ChannelResourceFactory.java:198) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                         
    at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory.generate(ChannelResourceFactory.java:59) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                          
    at org.apache.druid.java.util.http.client.pool.ResourcePool$ResourceHolderPerKey.get(ResourcePool.java:285) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                             
    at org.apache.druid.java.util.http.client.pool.ResourcePool.take(ResourcePool.java:109) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                 
    at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:127) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                  
    at org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sendSyncRequest(ChangeRequestHttpSyncer.java:247) ~[druid-server-33.0.0.jar:33.0.0]                                                                                           
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]                                                                                                                                                    
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]                                                                                                                                                                   
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]                                                                                                             
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]                                                                                                                                            
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]                                                                                                                                            
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]                                            

Exception from historical:

025-04-29T19:05:15,778 ERROR [main] org.apache.druid.query.lookup.LookupReferencesManager - Error while trying to get lookup list from coordinator for tier[__default]                                                                            
rg.apache.druid.java.util.common.IOE: Retries exhausted, couldn't fulfill request to [http://druid-druid-coordinators-7889b9b98d-jcrxs:8088/druid/coordinator/v1/lookups/config/__default?detailed=true].                                         
   at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:219) ~[druid-server-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:133) ~[druid-server-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.query.lookup.LookupReferencesManager.fetchLookupsForTier(LookupReferencesManager.java:626) ~[druid-server-33.0.0.jar:33.0.0]                                                                                               
   at org.apache.druid.query.lookup.LookupReferencesManager.tryGetLookupListFromCoordinator(LookupReferencesManager.java:474) ~[druid-server-33.0.0.jar:33.0.0]                                                                                   
   at org.apache.druid.query.lookup.LookupReferencesManager.lambda$getLookupListFromCoordinator$5(LookupReferencesManager.java:451) ~[druid-server-33.0.0.jar:33.0.0]                                                                             
   at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                                
   at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.query.lookup.LookupReferencesManager.getLookupListFromCoordinator(LookupReferencesManager.java:441) [druid-server-33.0.0.jar:33.0.0]                                                                                       
   at org.apache.druid.query.lookup.LookupReferencesManager.getLookupsList(LookupReferencesManager.java:418) [druid-server-33.0.0.jar:33.0.0]                                                                                                     
   at org.apache.druid.query.lookup.LookupReferencesManager.loadLookupsAndInitStateRef(LookupReferencesManager.java:394) [druid-server-33.0.0.jar:33.0.0]                                                                                         
   at org.apache.druid.query.lookup.LookupReferencesManager.start(LookupReferencesManager.java:171) [druid-server-33.0.0.jar:33.0.0]                                                                                                              
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]                                                                                                                                                       
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]                                                                                                                                     
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]                                                                                                                             
   at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]                                                                                                                                                                           
   at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) [druid-processing-33.0.0.jar:33.0.0]                                                                                                 
   at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) [druid-processing-33.0.0.jar:33.0.0]                                                                                                                        
   at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) [druid-processing-33.0.0.jar:33.0.0]                                                                                                                               
   at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) [druid-services-33.0.0.jar:33.0.0]                                                                                                                                 
   at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) [druid-services-33.0.0.jar:33.0.0]                                                                                                                                  
   at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:70) [druid-services-33.0.0.jar:33.0.0]                                                                                                                                          
   at org.apache.druid.cli.Main.main(Main.java:112) [druid-services-33.0.0.jar:33.0.0]                               

Exception from router:

2025-04-29T20:02:09,326 WARN [CoordinatorRuleManager-Exec--0] org.apache.druid.discovery.DruidLeaderClient - Request[http://druid-druid-coordinators-7889b9b98d-jcrxs:8088/druid/coordinator/v1/rules] failed.                                    
org.jboss.netty.channel.ChannelException: Faulty channel in resource pool                                                                                                                                                                         
    at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:134) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                  
    at org.apache.druid.java.util.http.client.AbstractHttpClient.go(AbstractHttpClient.java:33) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                             
    at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:158) ~[druid-server-33.0.0.jar:33.0.0]                                                                                                                              
    at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:133) ~[druid-server-33.0.0.jar:33.0.0]                                                                                                                              
    at org.apache.druid.server.router.CoordinatorRuleManager.poll(CoordinatorRuleManager.java:135) ~[druid-services-33.0.0.jar:33.0.0]                                                                                                            
    at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) [druid-processing-33.0.0.jar:33.0.0]                                                                                                    
    at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:51) [druid-processing-33.0.0.jar:33.0.0]                                                                                                    
    at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:87) [druid-processing-33.0.0.jar:33.0.0]                                                                                                     
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]                                                                                                                                                    
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]                                                                                                                                                                   
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]                                                                                                             
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]                                                                                                                                            
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]                                                                                                                                            
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]                                                                                                                                                                                      
Caused by: java.nio.channels.UnresolvedAddressException                                                                                                                                                                                           
    at java.base/sun.nio.ch.Net.checkAddress(Net.java:149) ~[?:?]                                                                                                                                                                                 
    at java.base/sun.nio.ch.Net.checkAddress(Net.java:157) ~[?:?]                                                                                                                                                                                 
    at java.base/sun.nio.ch.SocketChannelImpl.checkRemote(SocketChannelImpl.java:816) ~[?:?]                                                                                                                                                      
    at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:839) ~[?:?]                                                                                                                                                          
    at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108) ~[netty-3.10.6.Final.jar:?]                                                                                                   
    at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70) ~[netty-3.10.6.Final.jar:?]                                                                                                  
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) ~[netty-3.10.6.Final.jar:?]                                                                                    
    at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.handler.codec.http.HttpClientCodec.handleDownstream(HttpClientCodec.java:97) ~[netty-3.10.6.Final.jar:?]                                                                                                                   
    at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) ~[netty-3.10.6.Final.jar:?]                                                                                                                 
    at org.jboss.netty.channel.Channels.connect(Channels.java:634) ~[netty-3.10.6.Final.jar:?]                                                                                                                                                    
    at org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:215) ~[netty-3.10.6.Final.jar:?]                                                                                                                                      
    at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229) ~[netty-3.10.6.Final.jar:?]                                                                                                                                    
    at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182) ~[netty-3.10.6.Final.jar:?]                                                                                                                                    
    at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory.generate(ChannelResourceFactory.java:198) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                         
    at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory.generate(ChannelResourceFactory.java:59) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                          
    at org.apache.druid.java.util.http.client.pool.ResourcePool$ResourceHolderPerKey.get(ResourcePool.java:285) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                             
    at org.apache.druid.java.util.http.client.pool.ResourcePool.take(ResourcePool.java:109) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                 
    at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:127) ~[druid-processing-33.0.0.jar:33.0.0]                                                                                                                  
    ... 13 more                                               

For reproduction, please use my Helm chart. It works fine with Apache Druid 32.0.1, but breaks with version 33.0.0:

helm repo add druid-charts https://bsure-analytics.github.io/druid-charts
helm repo update
helm upgrade druid druid-charts/druid-dev --create-namespace --install --namespace druid --set druid.spec.image.tag=33.0.0

christian-schlichtherle avatar Apr 29 '25 20:04 christian-schlichtherle

The error messages seem to be talking about the host druid-druid-coordinators-7889b9b98d-jcrxs not being resolvable. I'm not a kubernetes expert so I am not sure what exactly could be causing that. But I wonder what might have changed since Druid 32.0.1. In the older version, was the Coordinator advertising itself with a different hostname? If so, you could customize that with the druid.host runtime property.

gianm avatar Apr 29 '25 20:04 gianm

Druid 33 changed the defaulit behaviour to use host name instead of IP for internal communication.

But for coordinator, it's a deploymentset, which means the hostname druid-druid-coordinators-7889b9b98d-jcrxs is not a FQDN that can be resolved.

I recommend u to set the DRUID_SET_HOST_IP env to 1 to restore previous behaviour.

See the release notes: https://github.com/apache/druid/releases/tag/druid-33.0.0#33.0.0-upgrade-notes-and-incompatible-changes

https://github.com/apache/druid/pull/17680

[#](https://github.com/apache/druid/releases/tag/druid-33.0.0#33.0.0-upgrade-notes-and-incompatible-changes-upgrade-notes-kubernetes-deployments) Kubernetes deployments
By default, the Docker image now uses the canonical hostname to register services in ZooKeeper for internal communication if you're running Druid in Kubernetes. Otherwise, it uses the IP address. https://github.com/apache/druid/pull/17697.

You can set the environment variable DRUID_SET_HOST_IP to 1 to restore old behavior.

FrankChen021 avatar Apr 30 '25 01:04 FrankChen021

So this change breaks communication with nodes that are deployed as Deployment kind in Kubernetes. To recover the old behavior, I shall set DRUID_SET_HOST_IP=1. My Helm chart is not using environment variables, but generating Java system properties instead. I would like to keep it that way for consistency, so can I use -Ddruid.set.host.ip=1 instead?

christian-schlichtherle avatar Apr 30 '25 08:04 christian-schlichtherle

So this change breaks communication with nodes that are deployed as Deployment kind in Kubernetes. To recover the old behavior, I shall set DRUID_SET_HOST_IP=1. My Helm chart is not using environment variables, but generating Java system properties instead. I would like to keep it that way for consistency, so can I use -Ddruid.set.host.ip=1 instead?

Looking at the source code, it seems like the answer is "no". Here's the relevant code:

if [ -z "${KUBERNETES_SERVICE_HOST}" ]
then
  # Running outside kubernetes, use IP addresses
  DRUID_SET_HOST_IP=${DRUID_SET_HOST_IP:-1}
else
  # Running in kubernetes, so use canonical names
  DRUID_SET_HOST_IP=${DRUID_SET_HOST_IP:-0}
fi

if [ "${DRUID_SET_HOST_IP}" = "1" ]
then
    setKey $SERVICE druid.host $(ip r get 1 | awk '{print $7;exit}')
fi

Well, that clearly doesn't work for Deployment resource kinds.

christian-schlichtherle avatar Apr 30 '25 08:04 christian-schlichtherle

Next issue:

08:18:37.017 [main] ERROR org.apache.druid.cli.PullDependencies - Unable to resolve artifacts for [org.apache.druid.extensions.contrib:druid-kubernetes-overlord-extensions:jar:33.0.0 (runtime) -> [] < [central (https://repo1.maven.org/maven2/
org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact org.apache.druid.extensions.contrib:druid-kubernetes-overlord-extensions:jar:33.0.0 in central (https://repo1.maven.org/maven2/)                             
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:342) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                      
    at org.apache.druid.cli.PullDependencies.downloadExtension(PullDependencies.java:392) [druid-services-33.0.0.jar:33.0.0]                                                                                                                      
    at org.apache.druid.cli.PullDependencies.downloadExtension(PullDependencies.java:346) [druid-services-33.0.0.jar:33.0.0]                                                                                                                      
    at org.apache.druid.cli.PullDependencies.run(PullDependencies.java:292) [druid-services-33.0.0.jar:33.0.0]                                                                                                                                    
    at org.apache.druid.cli.Main.main(Main.java:112) [druid-services-33.0.0.jar:33.0.0]                                                                                                                                                           
Caused by: org.eclipse.aether.resolution.ArtifactResolutionException: Could not find artifact org.apache.druid.extensions.contrib:druid-kubernetes-overlord-extensions:jar:33.0.0 in central (https://repo1.maven.org/maven2/)                    
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:413) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                                  
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:215) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                         
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:325) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                      
    ... 4 more                                                                                                                                                                                                                                    
Caused by: org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact org.apache.druid.extensions.contrib:druid-kubernetes-overlord-extensions:jar:33.0.0 in central (https://repo1.maven.org/maven2/)                        
    at org.eclipse.aether.connector.basic.ArtifactTransportListener.transferFailed(ArtifactTransportListener.java:48) ~[maven-resolver-connector-basic-1.3.1.jar:1.3.1]                                                                           
    at org.eclipse.aether.connector.basic.BasicRepositoryConnector$TaskRunner.run(BasicRepositoryConnector.java:368) ~[maven-resolver-connector-basic-1.3.1.jar:1.3.1]                                                                            
    at org.eclipse.aether.util.concurrency.RunnableErrorForwarder$1.run(RunnableErrorForwarder.java:75) ~[maven-resolver-util-1.3.1.jar:1.3.1]                                                                                                    
    at org.eclipse.aether.connector.basic.BasicRepositoryConnector$DirectExecutor.execute(BasicRepositoryConnector.java:642) ~[maven-resolver-connector-basic-1.3.1.jar:1.3.1]                                                                    
    at org.eclipse.aether.connector.basic.BasicRepositoryConnector.get(BasicRepositoryConnector.java:262) ~[maven-resolver-connector-basic-1.3.1.jar:1.3.1]                                                                                       
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.performDownloads(DefaultArtifactResolver.java:489) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                         
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:390) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                                  
    at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:215) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                         
    at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:325) ~[maven-resolver-impl-1.3.1.jar:1.3.1]                                                                                      
    ... 4 more                                                                                                                                                                                                                                    

Looks like the druid-kubernetes-overlord-extensions haven't been bumped to version 33.0.0 yet.

christian-schlichtherle avatar Apr 30 '25 08:04 christian-schlichtherle

Looks like the druid-kubernetes-overlord-extensions haven't been bumped to version 33.0.0 yet.

No, it's been moved to the core, so I don't have to use pull-deps anymore, that's a welcome change!

christian-schlichtherle avatar Apr 30 '25 08:04 christian-schlichtherle

So this change breaks communication with nodes that are deployed as Deployment kind in Kubernetes. To recover the old behavior, I shall set DRUID_SET_HOST_IP=1. My Helm chart is not using environment variables, but generating Java system properties instead. I would like to keep it that way for consistency, so can I use -Ddruid.set.host.ip=1 instead?

Looking at the source code, it seems like the answer is "no". Here's the relevant code:

if [ -z "${KUBERNETES_SERVICE_HOST}" ] then

Running outside kubernetes, use IP addresses

DRUID_SET_HOST_IP=${DRUID_SET_HOST_IP:-1} else

Running in kubernetes, so use canonical names

DRUID_SET_HOST_IP=${DRUID_SET_HOST_IP:-0} fi

if [ "${DRUID_SET_HOST_IP}" = "1" ] then setKey $SERVICE druid.host $(ip r get 1 | awk '{print $7;exit}') fi Well, that clearly doesn't work for Deployment resource kinds.

Put the DRUID_SET_HOST_IP in the helm value files.

FrankChen021 avatar May 02 '25 01:05 FrankChen021

Is there something we can or should change in the bundled script to improve this case? Otherwise I suppose we should close the issue, since the cause has been figured out.

gianm avatar May 16 '25 07:05 gianm

Well, honestly configuring the Druid is no smooth ride because of its wild mix of Java system properties, environment variables, XML (for logging), JSON (for metrics) etc. Unfortunately, this is architectural, so it can't be easily fixed at the root. However, it can be abstracted over and that's why I've created the Druid charts where you can configure everything using YAML, even the Java system properties: https://github.com/bsure-analytics/druid-charts

I apologize for the shameless self-plug.

christian-schlichtherle avatar May 16 '25 08:05 christian-schlichtherle