druid icon indicating copy to clipboard operation
druid copied to clipboard

Supervisor Task failing without any error

Open hara5 opened this issue 6 years ago • 15 comments

I created a new load data activity using Kafka and supervisor task is in UNHEALTHY STATE and all tasks are failing in 6 seconds.

Druid Version: apache-druid-0.16.0-incubating Any pointers on the issue

Screen Shot 2019-11-13 at 7 27 22 PM

Screen Shot 2019-11-13 at 7 22 48 PM

Screen Shot 2019-11-13 at 7 22 55 PM

hara5 avatar Nov 13 '19 15:11 hara5

Check overlord logs

jp707049 avatar Nov 15 '19 18:11 jp707049

I am running into same issue, but not able to find the logs anywhere. Have you figured it out what was your problem?

alexandra-diaconu avatar Nov 17 '19 03:11 alexandra-diaconu

unable to figure it out yet

hara5 avatar Nov 18 '19 03:11 hara5

@alexandra-diaconu Did you able top figure out the issue?

hara5 avatar Jan 06 '20 10:01 hara5

Have the exact same problem!

selfeky avatar Jan 27 '20 15:01 selfeky

@hara5 @selfeky what do the overlord logs say?

suneet-s avatar Jan 27 '20 16:01 suneet-s

@suneet-s Not sure!

druid/indexer/v1/task/{task_id}/log

ends with:

  "id" : "index_kafka_groups_a577fe538149f50_ofconddn",
  "status" : "SUCCESS",
  "duration" : 301981,
  "errorMsg" : null,
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}
2020-01-27T19:58:28,107 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2020-01-27T19:58:28,110 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@2965dd88].
2020-01-27T19:58:28,114 INFO [main] org.apache.druid.curator.announcement.Announcer - Stopping announcer
2020-01-27T19:58:28,117 INFO [main] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:10.12.0.29:8102]
2020-01-27T19:58:28,123 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2020-01-27T19:58:28,123 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server...
2020-01-27T19:58:28,129 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@4cd5fc46{HTTP/1.1,[http/1.1]}{0.0.0.0:8102}
2020-01-27T19:58:28,129 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
2020-01-27T19:58:28,131 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@2faf6e4a{/,null,UNAVAILABLE}
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@15e08615].
2020-01-27T19:58:28,139 INFO [main] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:10.12.0.29:8102]
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[org.apache.druid.indexing.worker.executor.ExecutorLifecycle@762405bf].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner.stop()] on object[org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner@5103eea2].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_groups_a577fe538149f50_ofconddn].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_groups_a577fe538149f50_ofconddn] status changed to [FAILED].
2020-01-27T19:58:28,140 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.worker.IntermediaryDataManager.stop() throws java.lang.InterruptedException] on object[org.apache.druid.indexing.worker.IntermediaryDataManager@32dcfeea].
2020-01-27T19:58:28,143 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.client.cache.CaffeineCache.close()] on object[org.apache.druid.client.cache.CaffeineCache@2c1a8529].
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@4536a715].
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Stopped.
2020-01-27T19:58:28,145 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@46ea78f0].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[org.apache.druid.storage.hdfs.HdfsStorageAuthentication@184751f3].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.metrics.MonitorScheduler.stop()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@786a3477].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.query.lookup.LookupReferencesManager.stop()] on object[org.apache.druid.query.lookup.LookupReferencesManager@fabef2e].
2020-01-27T19:58:28,148 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is stopping.
2020-01-27T19:58:28,148 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited, Lookup notices are not handled anymore.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is stopped.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@5cc3e49b].
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Stopped.
2020-01-27T19:58:28,149 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@451816fd].
2020-01-27T19:58:28,150 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.stop()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@656ec00d].
2020-01-27T19:58:28,150 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopping
2020-01-27T19:58:28,151 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopped
2020-01-27T19:58:28,151 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@384472bf].
2020-01-27T19:58:28,212 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[ServiceEmitter{serviceDimensions={service=druid/coordinator, host=10.12.0.29:8102, version=0.16.1-incubating}, emitter=org.apache.druid.java.util.emitter.core.NoopEmitter@44d64d4e}].
2020-01-27T19:58:28,212 INFO [main] org.apache.druid.curator.CuratorModule - Stopping Curator
2020-01-27T19:58:28,215 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2020-01-27T19:58:28,220 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000014b04d01a5
2020-01-27T19:58:28,221 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1000014b04d01a5 closed
2020-01-27T19:58:28,221 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
2020-01-27T19:58:28,221 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@27bb4dc5].
Finished peon task```

selfeky avatar Jan 27 '20 19:01 selfeky

@selfeky Did you resolve this issue? I am having same issue on apache-druid-0.16.0-incubating

vikramarsid avatar Apr 23 '20 19:04 vikramarsid

@vikramarsid no unfortunately

selfeky avatar Apr 26 '20 14:04 selfeky

@alexandra-diaconu, @selfeky @hara5 were you able to solve the issue? I am facing the same issue.

rishibhutada avatar Mar 25 '21 05:03 rishibhutada

@rishibhutada no unfortunately

selfeky avatar Mar 25 '21 13:03 selfeky

Same issue.. Any help?

nikhilamunipalli avatar Jul 12 '21 14:07 nikhilamunipalli

same for me

horidon avatar Aug 04 '21 16:08 horidon

Is this solved as we are also facing the same issue. hard reset might temporarily solve it but the supervisors get into unhealthy_tasks status.. It isn't stable. Any help or suggestion is appreciated

tej24 avatar Sep 28 '22 17:09 tej24

what version are you on? I would suggest using the latest version as many bug fixes have been made. Errors have been made much better over time. If you are seeing it on the latest version, overlord and task logs can offer more clues as to what is happening.

abhishekagarwal87 avatar Sep 29 '22 06:09 abhishekagarwal87

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Aug 03 '23 00:08 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Sep 01 '23 00:09 github-actions[bot]