incubator-livy icon indicating copy to clipboard operation
incubator-livy copied to clipboard

[WIP]: [Livy-336]: Livy should not spawn one thread per job to track the job on Yarn (Use YARN REST API)

Open meisam opened this issue 7 years ago • 8 comments

What changes were proposed in this pull request?

Currently, Livy spawns one thread to for each Spark application to track the status of the application on YARN. The threads are called yarnAppMonitorThread-xxxx and make multiple calls to YARN using YARN's Java API to get the status of running applications or to kill applications. Spawning so many threads is a bottle neck and limits the number of Spark applications that can be submitted in a short span of time.

This pull request addresses the above issue in 2 ways:

  1. It creates only one thread to poll YARN and get the status of all Spark applications.
  2. It replaces the Java API with YARN REST API to eliminate multiple calls to YARN.

How was this patch tested?

  • The existing unit tests and integration test are updated to reflect the changes in the code.
  • Every iteration of this PR is tested internally at PayPal at dev and production development. Some iterations have been running in productions for months.

What is not in the scope of this pull request?

Based on the discussion and feedback on this pull request, and based on the feedback

  • Calls to yarnClient.kill wont be replaced with REST API to kill YARN apps because:
    • The current kill REST API is experimental.
  • Maven dependencies to YARN won't be removed because:
    • There is minimal benefit in removing all YARN dependencies. Livy still depends on Hadoop libraries.
    • If we remove YARN dependencies, we should find a different way to get the Name Node URL. One way to do that is to add new parameters in `livy.con, but that adds complexity and possibly confusion.

Task-url: https://issues.apache.org/jira/browse/LIVY-336

meisam avatar Sep 06 '17 18:09 meisam

we can drop all maven dependencies to YARN

Not sure that brings a lot of gains; unless something changed, Livy uses a bunch of HDFS APIs already, so removing YARN would probably not remove a whole lot of dependencies.

vanzin avatar Sep 06 '17 18:09 vanzin

Not sure that brings a lot of gains; unless something changed, Livy uses a bunch of HDFS APIs already, so removing YARN would probably not remove a whole lot of dependencies.

On top of that, if we remove all dependencies to YARN, we have to put the YARN cofings, such as resouce manager REST URL, in livy.conf. If we keep the dependencies, we can read the configs from yarn-site.xml and build the REST URL.

meisam avatar Sep 06 '17 18:09 meisam

Codecov Report

Merging #44 into master will decrease coverage by 4.2%. The diff coverage is 47.54%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #44      +/-   ##
============================================
- Coverage     70.73%   66.52%   -4.21%     
+ Complexity      788      787       -1     
============================================
  Files            97       99       +2     
  Lines          5388     5556     +168     
  Branches        802      850      +48     
============================================
- Hits           3811     3696     -115     
- Misses         1037     1323     +286     
+ Partials        540      537       -3
Impacted Files Coverage Δ Complexity Δ
...main/scala/org/apache/livy/server/LivyServer.scala 1.75% <0%> (-35.31%) 2 <0> (-8)
...rver/src/main/scala/org/apache/livy/LivyConf.scala 92.96% <100%> (-2.97%) 12 <0> (-3)
...rc/main/scala/org/apache/livy/utils/SparkApp.scala 53.12% <37.5%> (-27.65%) 1 <0> (ø)
...in/scala/org/apache/livy/utils/YarnInterface.scala 42.23% <42.23%> (ø) 20 <20> (?)
...ain/scala/org/apache/livy/utils/SparkYarnApp.scala 30.26% <59.25%> (-31.71%) 33 <22> (+4)
...cala/org/apache/livy/utils/ApplicationReport.scala 93.75% <93.75%> (ø) 1 <1> (?)
...r/src/main/scala/org/apache/livy/utils/Clock.scala 42.85% <0%> (-57.15%) 0% <0%> (ø)
core/src/main/scala/org/apache/livy/Logging.scala 54.54% <0%> (-18.19%) 0% <0%> (ø)
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 05bfa15...1555157. Read the comment docs.

codecov-io avatar Sep 06 '17 21:09 codecov-io

What's the benefit of using REST API?

jerryshao avatar Sep 08 '17 08:09 jerryshao

The main benefit is that with one call we can get all the info we need for the app, that is appId, state, finalStatus, diagnostics, trackingUrl, SparkUiUrl, etc. With Yarn's Java API, we can get appId with one call, but to get trakingURL, SparkUiUrl, etc., we have to make extra calls. First we need to get the attemptId, then we need to get the containerId, and then we can get the remaining information.

There is one other potential benefit, which is we can drop dependencies to YARN. But, as Marcelo described, the gains from dropping YARN dependencies are minimal.

I am alright with either YARN's Java API or REST API. I just added this PR because the proposal on the mailing list recommended REST.

meisam avatar Sep 08 '17 16:09 meisam

Thanks @ajbozarth. This was helpful feedback.

meisam avatar Sep 18 '17 20:09 meisam

I don't have a strong preference on changing to REST APIs, except one advantage you mentioned above (get all infos in one request). I'm not sure usually how many application will we launch in one Livy server, if it is only tens or hundreds of application, I think the communication overhead should not be big.

Also avoiding yarn dependency is not a big problem, anyway we still need to depend on hadoop jars.

One difference I can think of is to visit security Hadoop, now because we change to REST API, so we should provide spnego principal/keytab, rather than Livy principal/keytab.

jerryshao avatar Sep 19 '17 03:09 jerryshao

I don't have a strong preference on changing to REST APIs, except one advantage you mentioned above (get all infos in one request).

The main gain from this PR is LIVY-336: Livy should not spawn one thread per job to track the job on Yarn.

One difference I can think of is to visit security Hadoop, now because we change to REST API, so we should provide spnego principal/keytab, rather than Livy principal/keytab.

YARN's Cluster Applications API does not need authentication with one exception: deleting a YARN app needs authentication. But the delete API is experimental and I am not sure if it is a good idea to use it.

Since dropping the YARN dependencies has minimal benefits, it makes sense to kill YARN apps with java

meisam avatar Sep 19 '17 12:09 meisam