volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Resolve podgroup residual

Open HDJZX opened this issue 2 years ago • 22 comments

background

Lack of external modules to manage PodGroup status based on workload status, making Volcano unable to perceive the lifecycle of upper level workloads (success/failure/pause/termination/scaling, etc.) and unable to manage PodGroup correctly. The following scenario will appear: - a. In scenarios where tasks (such as Pod/Job) run successfully/fail, the corresponding PodGroup remains and occupies the queue quota. - b. In scenarios where tasks (such as Deployment/StatefulSet) are scaled down to 0, the corresponding PodGroup remains and occupies the queue quota

Solution ideas

  • When pod is in its final state:succeed、failed, clear the podgroup corresponding to pod。If pod has a controller owner reference, the check will be skipped
  • When the number of statefulset replicas is 0, clear the podgroup corresponding to the statefulset
  • When the job is in a paused state, clear the podgroup corresponding to the job (support for job pause operation is required)

HDJZX avatar Nov 07 '23 09:11 HDJZX

Welcome @HDJZX!

It looks like this is your first PR to volcano-sh/volcano 馃帀.

Thank you, and welcome to Volcano. :smiley:

volcano-sh-bot avatar Nov 07 '23 09:11 volcano-sh-bot

Hi, welcome. Please add some backgroud information to explain why this is needed.

Monokaix avatar Nov 20 '23 07:11 Monokaix

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign wpeng102 You can assign the PR to them by writing /assign @wpeng102 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Nov 20 '23 09:11 volcano-sh-bot

Update background and solutions https://github.com/volcano-sh/volcano/pull/3182#issue-1980922569

HDJZX avatar Nov 20 '23 10:11 HDJZX

Please squash your commits and make CI happy, lgtm except this.

Monokaix avatar Nov 21 '23 09:11 Monokaix

I mean merge to one commit and git commit with -s : )

Monokaix avatar Nov 22 '23 02:11 Monokaix

ok, done.

HDJZX avatar Nov 22 '23 03:11 HDJZX

/lgtm

Monokaix avatar Nov 27 '23 01:11 Monokaix

@Monokaix can you review again ? I remove unused dependencies from unit testing

HDJZX avatar Nov 27 '23 03:11 HDJZX

/ok-to-test

Monokaix avatar Nov 27 '23 09:11 Monokaix

@Monokaix Sorry,I seem unable to trigger CI.

HDJZX avatar Nov 27 '23 11:11 HDJZX

Thansk for your contribution, the CI is triggered.

william-wang avatar Nov 27 '23 12:11 william-wang

@Monokaix Can you help trigger it once,Thanks

HDJZX avatar Nov 28 '23 10:11 HDJZX

@Monokaix Can you help trigger it once,Thanks

sorry, I have no tright to trigger ci, please help @william-wang

Monokaix avatar Nov 28 '23 11:11 Monokaix

@william-wang Can you help trigger it once,Thanks

HDJZX avatar Nov 30 '23 05:11 HDJZX

@Monokaix Can you review it? lgtm can also trigger ci,thanks

HDJZX avatar Dec 04 '23 10:12 HDJZX

/lgtm

Monokaix avatar Dec 04 '23 11:12 Monokaix

Hi, is there any problem here?

Monokaix avatar Dec 06 '23 06:12 Monokaix

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Mar 17 '24 09:03 stale[bot]

@HDJZX please rebase and squash commits to only one.

lowang-bh avatar Apr 07 '24 12:04 lowang-bh

/needs-rebase

lowang-bh avatar Apr 07 '24 12:04 lowang-bh

/lgtm

Monokaix avatar Apr 08 '24 09:04 Monokaix