simr icon indicating copy to clipboard operation
simr copied to clipboard

Getting ClosedChannelException:null from master

Open javadba opened this issue 11 years ago • 2 comments

I have attempted to run simr --shell with 10 nodes. The 9 slave nodes were brought up and I can see they are still running by viewing the JobTracker UI. But the Master instance died. In the Mapper Log of the Master we see a few of the following exceptions:

2014-02-18 20:30:34,990 WARN akka.actor.ActorSystemImpl: RemoteClientWriteFailed@akka://[email protected]:45121: MessageClass[scala.Tuple3] Error[java.nio.channels.ClosedChannelException:null at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:703) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:426) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:116) at org.jboss.netty.channel.Channels.write(Channels.java:733) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:65) at org.jboss.netty.channel.Channels.write(Channels.java:733) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:65) at org.jboss.netty.handler.execution.ExecutionHandler.handleDownstream(ExecutionHandler.java:185) at org.jboss.netty.channel.Channels.write(Channels.java:712) at org.jboss.netty.channel.Channels.write(Channels.java:679) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:246) at akka.remote.netty.RemoteClient.send(Client.scala:76) at akka.remote.netty.RemoteClient.send(Client.scala:63) at akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:154) at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247) at org.apache.spark.simr.RelayServer$$anonfun$receive$1.apply(RelayServer.scala:180) at org.apache.spark.simr.RelayServer$$anonfun$receive$1.apply(RelayServer.scala:151) at akka.actor.Actor$class.apply(Actor.scala:318) at org.apache.spark.simr.RelayServer.apply(RelayServer.scala:50) at akka.actor.ActorCell.invoke(ActorCell.scala:626) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) at akka.dispatch.Mailbox.run(Mailbox.scala:179) at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) ]

Is this a configuration error or some other issue?

javadba avatar Feb 18 '14 20:02 javadba

Here is log from one of the slaves

Task Logs: 'attempt_201309101252_50100_m_000000_0'

stderr logs

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [file:/ngs4/app/mapred/tt/taskTracker/edwaetlt/distcache/-3665564613369269577_249905135_1185992748/ma-gbit-lnn11.corp.apple.com/user/edwaetlt/.staging/job_201309101252_50100/libjars/spark-assembly-hadoop-1.0.4.jar/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

syslog logs

2014-02-18 20:30:25,361 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2014-02-18 20:30:25,509 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-02-18 20:30:25,527 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started 2014-02-18 20:30:25,572 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-02-18 20:30:25,573 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-02-18 20:30:25,573 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2014-02-18 20:30:25,573 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-02-18 20:30:25,576 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2014-02-18 20:30:25,668 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2014-02-18 20:30:25,684 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1dacccc 2014-02-18 20:30:38,647 INFO akka.event.slf4j.Slf4jEventHandler: Slf4jEventHandler started 2014-02-18 20:30:38,791 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka://[email protected]:35581/user/CoarseGrainedScheduler 2014-02-18 20:30:38,925 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully registered with driver 2014-02-18 20:30:38,936 INFO org.apache.spark.executor.Executor: Using REPL class URI: http://17.169.56.37:47094 2014-02-18 20:30:38,972 INFO akka.event.slf4j.Slf4jEventHandler: Slf4jEventHandler started 2014-02-18 20:30:38,989 INFO org.apache.spark.SparkEnv: Connecting to BlockManagerMaster: akka://[email protected]:35581/user/BlockManagerMaster 2014-02-18 20:30:38,989 INFO akka.actor.ActorSystemImpl: RemoteServerStarted@akka://[email protected]:36362 2014-02-18 20:30:39,014 INFO org.apache.spark.storage.DiskBlockManager: Created local directory at /ngs4/app/mapred/tt/taskTracker/edwaetlt/jobcache/job_201309101252_50100/attempt_201309101252_50100_m_000000_0/work/tmp/spark-local-20140218203039-e9c8 2014-02-18 20:30:39,020 INFO org.apache.spark.storage.MemoryStore: MemoryStore started with capacity 1334.8 MB. 2014-02-18 20:30:39,048 INFO org.apache.spark.network.ConnectionManager: Bound socket to port 34921 with id = ConnectionManagerId(17.169.56.50,34921) 2014-02-18 20:30:39,052 INFO org.apache.spark.storage.BlockManagerMaster: Trying to register BlockManager 2014-02-18 20:30:39,055 INFO akka.actor.ActorSystemImpl: RemoteClientStarted@akka://[email protected]:35581 2014-02-18 20:30:39,066 INFO org.apache.spark.storage.BlockManagerMaster: Registered BlockManager 2014-02-18 20:30:39,091 INFO org.apache.spark.SparkEnv: Connecting to MapOutputTracker: akka://[email protected]:35581/user/MapOutputTracker 2014-02-18 20:30:39,101 INFO org.apache.spark.HttpFileServer: HTTP File server directory is /ngs4/app/mapred/tt/taskTracker/edwaetlt/jobcache/job_201309101252_50100/attempt_201309101252_50100_m_000000_0/work/tmp/spark-d7043dca-e00a-4972-a006-c36a28e8830d 2014-02-18 20:30:39,154 INFO org.eclipse.jetty.server.Server: jetty-7.x.y-SNAPSHOT 2014-02-18 20:30:39,171 INFO org.eclipse.jetty.server.AbstractConnector: Started [email protected]:59212

javadba avatar Feb 18 '14 20:02 javadba

I met the same problem.

hellocodeM avatar Oct 09 '14 12:10 hellocodeM