dlrover
dlrover copied to clipboard
【WIP】add pod diagnosis feature
What changes were proposed in this pull request?
- monitor pod periodically
- diagnose long pending pods based on the monitoring data
Why are the changes needed?
automatically recover the job from the long pending problem
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Codecov Report
Attention: Patch coverage is 62.83186% with 42 lines in your changes missing coverage. Please review.
Project coverage is 80.31%. Comparing base (
adc8bba) to head (4b58a10). Report is 257 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #1219 +/- ##
==========================================
- Coverage 80.41% 80.31% -0.11%
==========================================
Files 217 219 +2
Lines 19463 19567 +104
==========================================
+ Hits 15652 15715 +63
- Misses 3811 3852 +41
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
These part implementation is under construction. Please refactor ur implement after the next release(v0.4.0)