gobblin icon indicating copy to clipboard operation
gobblin copied to clipboard

[GOBBLIN-1083] Unit test improving & return failed when helix task cancelled

Open autumnust opened this issue 5 years ago • 4 comments

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

  • https://issues.apache.org/jira/browse/GOBBLIN-1083

Description

This PR contains several changes:

  • To increase the chance of a cancelled task to be re-balanced by Helix, for cancelled task we marked it as "FAILED" instead of "CANCELLED" in GobblinHelixTask.
  • Add a return value of waitJobCompletion to reflect if the job is successfully finished or not.
  • In SleepTask, instead of propagating throwable which is not handled properly, we call failTask which sets WorkUnitState.WorkingState.FAILED and ConfigurationKeys.TASK_FAILURE_EXCEPTION_KEY, the latter will be used to handled propagated exception properly in org.apache.gobblin.runtime.TaskState#setTaskFailureException, so in the test routine it is handy in terms of expected failure handling. The same for FailedTask. Simply put, before the change, we used customized task framework in SingleTask for integration test but they don't handle failure case properly.
  • Some renaming / documentation.

Tests

  • [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

autumnust avatar Mar 12 '20 05:03 autumnust

Can you add a 'Description' in the PR. I did not understand why you are trying to make a HelixTask return Failed when it is cancelled?

arjun4084346 avatar Mar 31 '20 21:03 arjun4084346

@arjun4084346 sorry about missing the description. Adding some explanation on those, specifically returning FAILED in case of CANCELLED is recommended by Helix to maximize the chance of a cancelled task to be rescheduled, so that in Gobblin-Streaming we don't have a kafka partition ( part of cancelled helix) being orphaned.

@arjun4084346 Can you also take a look the PR ? Thanks.

autumnust avatar Apr 04 '20 01:04 autumnust

I see. Just keep that in mind that sometimes, we do not want to reschedule it, e.g. when user cancelled the job in GaaS. Is there a way to do that? Will take a deeper look on Monday. Happy Friday!

arjun4084346 avatar Apr 04 '20 01:04 arjun4084346

Codecov Report

Merging #2923 into master will decrease coverage by 0.00%. The diff coverage is 37.50%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2923      +/-   ##
============================================
- Coverage     45.59%   45.59%   -0.01%     
+ Complexity     9160     9159       -1     
============================================
  Files          1936     1936              
  Lines         73295    73300       +5     
  Branches       8088     8088              
============================================
  Hits          33418    33418              
- Misses        36781    36787       +6     
+ Partials       3096     3095       -1     
Impacted Files Coverage Δ Complexity Δ
...a/org/apache/gobblin/cluster/GobblinHelixTask.java 77.61% <0.00%> (ø) 6.00 <0.00> (ø)
...e/gobblin/cluster/TaskRunnerSuiteProcessModel.java 100.00% <ø> (ø) 5.00 <0.00> (ø)
...he/gobblin/writer/FineGrainedWatermarkTracker.java 82.25% <ø> (ø) 28.00 <0.00> (ø)
...src/main/java/org/apache/gobblin/runtime/Task.java 67.26% <ø> (ø) 84.00 <0.00> (ø)
.../apache/gobblin/runtime/task/BaseAbstractTask.java 52.38% <0.00%> (-8.74%) 6.00 <0.00> (ø)
...va/org/apache/gobblin/runtime/task/FailedTask.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...in/java/org/apache/gobblin/cluster/HelixUtils.java 37.30% <50.00%> (-3.18%) 15.00 <0.00> (-1.00)
.../java/org/apache/gobblin/cluster/SleepingTask.java 48.48% <80.00%> (+9.09%) 3.00 <0.00> (ø)
.../org/apache/gobblin/metrics/RootMetricContext.java 78.12% <0.00%> (-1.57%) 15.00% <0.00%> (-1.00%)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9ef461a...cc02f49. Read the comment docs.

codecov-io avatar Apr 04 '20 01:04 codecov-io