druid
druid copied to clipboard
Supervisor Task failing without any error
I created a new load data activity using Kafka and supervisor task is in UNHEALTHY STATE and all tasks are failing in 6 seconds.
Druid Version: apache-druid-0.16.0-incubating Any pointers on the issue



Check overlord logs
I am running into same issue, but not able to find the logs anywhere. Have you figured it out what was your problem?
unable to figure it out yet
@alexandra-diaconu Did you able top figure out the issue?
Have the exact same problem!
@hara5 @selfeky what do the overlord logs say?
@suneet-s Not sure!
druid/indexer/v1/task/{task_id}/log
ends with:
"id" : "index_kafka_groups_a577fe538149f50_ofconddn",
"status" : "SUCCESS",
"duration" : 301981,
"errorMsg" : null,
"location" : {
"host" : null,
"port" : -1,
"tlsPort" : -1
}
}
2020-01-27T19:58:28,107 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2020-01-27T19:58:28,110 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@2965dd88].
2020-01-27T19:58:28,114 INFO [main] org.apache.druid.curator.announcement.Announcer - Stopping announcer
2020-01-27T19:58:28,117 INFO [main] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:10.12.0.29:8102]
2020-01-27T19:58:28,123 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2020-01-27T19:58:28,123 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server...
2020-01-27T19:58:28,129 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@4cd5fc46{HTTP/1.1,[http/1.1]}{0.0.0.0:8102}
2020-01-27T19:58:28,129 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
2020-01-27T19:58:28,131 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@2faf6e4a{/,null,UNAVAILABLE}
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@15e08615].
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:10.12.0.29:8102]
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[org.apache.druid.indexing.worker.executor.ExecutorLifecycle@762405bf].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner.stop()] on object[org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner@5103eea2].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_groups_a577fe538149f50_ofconddn].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_groups_a577fe538149f50_ofconddn] status changed to [FAILED].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.worker.IntermediaryDataManager.stop() throws java.lang.InterruptedException] on object[org.apache.druid.indexing.worker.IntermediaryDataManager@32dcfeea].
2020-01-27T19:58:28,143 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.client.cache.CaffeineCache.close()] on object[org.apache.druid.client.cache.CaffeineCache@2c1a8529].
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@4536a715].
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Stopped.
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@46ea78f0].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[org.apache.druid.storage.hdfs.HdfsStorageAuthentication@184751f3].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.metrics.MonitorScheduler.stop()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@786a3477].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.query.lookup.LookupReferencesManager.stop()] on object[org.apache.druid.query.lookup.LookupReferencesManager@fabef2e].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is stopping.
2020-01-27T19:58:28,148 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited, Lookup notices are not handled anymore.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is stopped.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@5cc3e49b].
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Stopped.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@451816fd].
2020-01-27T19:58:28,150 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.stop()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@656ec00d].
2020-01-27T19:58:28,150 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopping
2020-01-27T19:58:28,151 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopped
2020-01-27T19:58:28,151 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@384472bf].
2020-01-27T19:58:28,212 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[ServiceEmitter{serviceDimensions={service=druid/coordinator, host=10.12.0.29:8102, version=0.16.1-incubating}, emitter=org.apache.druid.java.util.emitter.core.NoopEmitter@44d64d4e}].
2020-01-27T19:58:28,212 INFO [main] org.apache.druid.curator.CuratorModule - Stopping Curator
2020-01-27T19:58:28,215 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2020-01-27T19:58:28,220 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000014b04d01a5
2020-01-27T19:58:28,221 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1000014b04d01a5 closed
2020-01-27T19:58:28,221 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
2020-01-27T19:58:28,221 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@27bb4dc5].
Finished peon task```
@selfeky Did you resolve this issue? I am having same issue on apache-druid-0.16.0-incubating
@vikramarsid no unfortunately
@alexandra-diaconu, @selfeky @hara5 were you able to solve the issue? I am facing the same issue.
@rishibhutada no unfortunately
Same issue.. Any help?
same for me
Is this solved as we are also facing the same issue. hard reset might temporarily solve it but the supervisors get into unhealthy_tasks status.. It isn't stable. Any help or suggestion is appreciated
what version are you on? I would suggest using the latest version as many bug fixes have been made. Errors have been made much better over time. If you are seeing it on the latest version, overlord and task logs can offer more clues as to what is happening.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.