Elly.jl icon indicating copy to clipboard operation
Elly.jl copied to clipboard

When workers fail to launch how do you find out why

Open jpfairbanks opened this issue 9 years ago • 1 comments

In the example for the yarn cluster manager, I get a timeout when launching the workers. How can I find out why they are failing to launch in order to debug it? I guess i need to look at some log files somewhere.

jpfairbanks avatar Aug 24 '16 18:08 jpfairbanks

There may be some errors logged in Yarn logs. In a standalone installation, it is usually the $HADOOP_HOME/logs folder.

While launching a lot of workers, the error are usually:

  • Not enough resources. Yarn can not fulfill the request. Yarn would log these errors.
  • Default time-out is too less. Specify a higher launch_timeout while creating the cluster manager.

tanmaykm avatar Aug 25 '16 05:08 tanmaykm