gobblin
gobblin copied to clipboard
[GOBBLIN-1083] Unit test improving & return failed when helix task cancelled
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
- https://issues.apache.org/jira/browse/GOBBLIN-1083
Description
This PR contains several changes:
- To increase the chance of a cancelled task to be re-balanced by Helix, for cancelled task we marked it as "FAILED" instead of "CANCELLED" in GobblinHelixTask.
- Add a return value of
waitJobCompletionto reflect if the job is successfully finished or not. - In SleepTask, instead of propagating throwable which is not handled properly, we call
failTaskwhich setsWorkUnitState.WorkingState.FAILEDandConfigurationKeys.TASK_FAILURE_EXCEPTION_KEY, the latter will be used to handled propagated exception properly in org.apache.gobblin.runtime.TaskState#setTaskFailureException, so in the test routine it is handy in terms of expected failure handling. The same forFailedTask. Simply put, before the change, we used customized task framework inSingleTaskfor integration test but they don't handle failure case properly. - Some renaming / documentation.
Tests
- [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:
Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
- Subject is separated from body by a blank line
- Subject is limited to 50 characters
- Subject does not end with a period
- Subject uses the imperative mood ("add", not "adding")
- Body wraps at 72 characters
- Body explains "what" and "why", not "how"
Can you add a 'Description' in the PR. I did not understand why you are trying to make a HelixTask return Failed when it is cancelled?
@arjun4084346 sorry about missing the description. Adding some explanation on those, specifically returning FAILED in case of CANCELLED is recommended by Helix to maximize the chance of a cancelled task to be rescheduled, so that in Gobblin-Streaming we don't have a kafka partition ( part of cancelled helix) being orphaned.
@arjun4084346 Can you also take a look the PR ? Thanks.
I see. Just keep that in mind that sometimes, we do not want to reschedule it, e.g. when user cancelled the job in GaaS. Is there a way to do that? Will take a deeper look on Monday. Happy Friday!
Codecov Report
Merging #2923 into master will decrease coverage by
0.00%. The diff coverage is37.50%.
@@ Coverage Diff @@
## master #2923 +/- ##
============================================
- Coverage 45.59% 45.59% -0.01%
+ Complexity 9160 9159 -1
============================================
Files 1936 1936
Lines 73295 73300 +5
Branches 8088 8088
============================================
Hits 33418 33418
- Misses 36781 36787 +6
+ Partials 3096 3095 -1
| Impacted Files | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| ...a/org/apache/gobblin/cluster/GobblinHelixTask.java | 77.61% <0.00%> (ø) |
6.00 <0.00> (ø) |
|
| ...e/gobblin/cluster/TaskRunnerSuiteProcessModel.java | 100.00% <ø> (ø) |
5.00 <0.00> (ø) |
|
| ...he/gobblin/writer/FineGrainedWatermarkTracker.java | 82.25% <ø> (ø) |
28.00 <0.00> (ø) |
|
| ...src/main/java/org/apache/gobblin/runtime/Task.java | 67.26% <ø> (ø) |
84.00 <0.00> (ø) |
|
| .../apache/gobblin/runtime/task/BaseAbstractTask.java | 52.38% <0.00%> (-8.74%) |
6.00 <0.00> (ø) |
|
| ...va/org/apache/gobblin/runtime/task/FailedTask.java | 0.00% <0.00%> (ø) |
0.00 <0.00> (ø) |
|
| ...in/java/org/apache/gobblin/cluster/HelixUtils.java | 37.30% <50.00%> (-3.18%) |
15.00 <0.00> (-1.00) |
|
| .../java/org/apache/gobblin/cluster/SleepingTask.java | 48.48% <80.00%> (+9.09%) |
3.00 <0.00> (ø) |
|
| .../org/apache/gobblin/metrics/RootMetricContext.java | 78.12% <0.00%> (-1.57%) |
15.00% <0.00%> (-1.00%) |
|
| ... and 2 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 9ef461a...cc02f49. Read the comment docs.