dlrover icon indicating copy to clipboard operation
dlrover copied to clipboard

【WIP】add pod diagnosis feature

Open xiaochaoren opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

  1. monitor pod periodically
  2. diagnose long pending pods based on the monitoring data

Why are the changes needed?

automatically recover the job from the long pending problem

Does this PR introduce any user-facing change?

No.

How was this patch tested?

xiaochaoren avatar Aug 01 '24 03:08 xiaochaoren

Codecov Report

Attention: Patch coverage is 62.83186% with 42 lines in your changes missing coverage. Please review.

Project coverage is 80.31%. Comparing base (adc8bba) to head (4b58a10). Report is 257 commits behind head on master.

Files with missing lines Patch % Lines
...r/diagnosis/operator/check_pod_pending_operator.py 37.14% 22 Missing :warning:
dlrover/python/master/diagnosis/diagnosis.py 16.66% 10 Missing :warning:
dlrover/python/master/monitor/pod_monitor.py 80.00% 6 Missing :warning:
dlrover/python/common/diagnosis.py 85.71% 2 Missing :warning:
dlrover/python/master/watcher/k8s_watcher.py 85.71% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1219      +/-   ##
==========================================
- Coverage   80.41%   80.31%   -0.11%     
==========================================
  Files         217      219       +2     
  Lines       19463    19567     +104     
==========================================
+ Hits        15652    15715      +63     
- Misses       3811     3852      +41     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 02 '24 05:08 codecov[bot]

These part implementation is under construction. Please refactor ur implement after the next release(v0.4.0)

BalaBalaYi avatar Nov 18 '24 09:11 BalaBalaYi