incubator-livy icon indicating copy to clipboard operation
incubator-livy copied to clipboard

[LIVY-336][SERVER] Livy should not spawn one thread per job to track the job on Yarn

Open runzhiwang opened this issue 5 years ago • 12 comments

What changes were proposed in this pull request?

Instead of spawning a monitor thread for every session, create a centralized thread to monitor all apps.

How was this patch tested?

Existed ITs and UTs.

runzhiwang avatar Oct 06 '19 04:10 runzhiwang

Codecov Report

Merging #242 into master will increase coverage by 0.14%. The diff coverage is 77.14%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #242      +/-   ##
============================================
+ Coverage     68.09%   68.23%   +0.14%     
- Complexity      939      949      +10     
============================================
  Files           102      102              
  Lines          5883     5935      +52     
  Branches        893      904      +11     
============================================
+ Hits           4006     4050      +44     
- Misses         1302     1310       +8     
  Partials        575      575
Impacted Files Coverage Δ Complexity Δ
...rver/src/main/scala/org/apache/livy/LivyConf.scala 96.07% <100%> (+0.05%) 21 <0> (ø) :arrow_down:
...ain/scala/org/apache/livy/utils/SparkYarnApp.scala 76.07% <76.47%> (+1.07%) 50 <7> (+10) :arrow_up:
...cala/org/apache/livy/scalaapi/ScalaJobHandle.scala 52.94% <0%> (-2.95%) 7% <0%> (ø)
...c/src/main/java/org/apache/livy/rsc/RSCClient.java 73.49% <0%> (+1.2%) 20% <0%> (ø) :arrow_down:
...ala/org/apache/livy/scalaapi/LivyScalaClient.scala 86.66% <0%> (+3.33%) 7% <0%> (ø) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 319386a...3144ff1. Read the comment docs.

codecov-io avatar Oct 06 '19 05:10 codecov-io

@jahstreet Thank you for your comments. I have updated the PR.

runzhiwang avatar Nov 08 '19 11:11 runzhiwang

@runzhiwang Yeah, I see, will go through once again during weekend. Good job man!

jahstreet avatar Nov 08 '19 11:11 jahstreet

I still need a close look to see the detailed implementations.

jerryshao avatar Nov 14 '19 02:11 jerryshao

I'm OK with the current changes, @yiheng would you please also take a look.

jerryshao avatar Nov 20 '19 02:11 jerryshao

Looks ok from my side, @yiheng please help to review again.

jerryshao avatar Nov 22 '19 03:11 jerryshao

@jahstreet Thank you very much, I have updated according to your comments.

runzhiwang avatar Dec 06 '19 11:12 runzhiwang

@runzhiwang can you test your patch on a Yarn cluster?

jahstreet avatar Dec 06 '19 12:12 jahstreet

@jahstreet OK.

runzhiwang avatar Dec 06 '19 12:12 runzhiwang

@jahstreet Hi, I have test the patch, and it works fine.

runzhiwang avatar Dec 07 '19 08:12 runzhiwang

Hey @runzhiwang what's the status of this PR? Are you going to merge this in? I'm having issues with dangling threads with a lot of jobs running simultaneously and this fix will be great to have!

sanchay0 avatar Apr 16 '20 01:04 sanchay0

@sanchay0 Thanks for your attention. I have completed the code. And It just need more review.

runzhiwang avatar Apr 16 '20 01:04 runzhiwang