scheduler-plugins icon indicating copy to clipboard operation
scheduler-plugins copied to clipboard

cache assigned pod count

Open KunWuLuan opened this issue 1 year ago • 15 comments

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR will enhance the speed of the Coscheduling plugin in counting Pods that have already been assumed.

Which issue(s) this PR fixes:

Fix #707

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE. This is a performance enhancement. Users do not need to do anything to use it.

KunWuLuan avatar Mar 28 '24 12:03 KunWuLuan

Hi @KunWuLuan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 28 '24 12:03 k8s-ci-robot

Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name Link
Latest commit 1c63722a8288800760a4c35c27724c7fb2faa161
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/6618a37f99df4600082a280c

netlify[bot] avatar Mar 28 '24 12:03 netlify[bot]

/ok-to-test

Huang-Wei avatar Apr 01 '24 21:04 Huang-Wei

Could you help fix the CI failures?

@Huang-Wei Hi, I have fix the CI failures. Please have a look when you have time, thanks

KunWuLuan avatar Apr 03 '24 09:04 KunWuLuan

I forgot one thing about the cache's consistency during one scheduling cycle - we will need to:

  • snapshot the pg->podNames map at the beginning of the scheduling cycle (PreFilter), so that we can treat it as source of truth during the whole scheduling cycle

  • support preemption

    • implement the Clone() function
    • for each PodAddition dryrun, if the pod is hit, add it
    • for each PodDeletion dryrun, if the pod is hit, remove it

We only check the number of pods assigned in Permit, so I think there is no inconsistency during one scheduling cycle.

And postFilter will not check Permit plugin, so implementation of PodAddition and PodDeletion will have no effect on preemption, right?

What we can do is return framework.Unschedulable if the PodDeletion will make a podgroup rejected, but I think it is not enought for preemption of coscheduling.

I think support preemption for coscheduling is complecated, maybe in another issue. We can determine the expected behaviro for preemption of coscheduling. WDYT? #581

KunWuLuan avatar Apr 12 '24 03:04 KunWuLuan

And postFilter will not check Permit plugin, so implementation of PodAddition and PodDeletion will have no effect on preemption, right?

Yes, the current preemption skeleton code assumes each plugin only use PreFilter to pre-calculate state. But for coscheduling, PreFilter can fail early (upon inadequate quorum).

I think scheduler framework should open up a hook for out-of-tree plugin to choose whether or not to run PreFilter as part of the preemption; otherwise, out-of-tree plugin has to rewrite the PostFilter impl. to hack that part.

I think support preemption for coscheduling is complecated, maybe in another issue. We can determine the expected behaviro for preemption of coscheduling. WDYT?

Let's consolidate all the cases and use a new PR to try to tackle it. Thanks.

Huang-Wei avatar Apr 12 '24 05:04 Huang-Wei

@KunWuLuan are you ok with postpone this PR's merge after I cut release for v0.28, so that we have more time for soak testing.

And could you add a release-note to highlight it's a performance enhancement?

Huang-Wei avatar Apr 12 '24 05:04 Huang-Wei

@KunWuLuan are you ok with postpone this PR's merge after I cut release for v0.28, so that we have more time for soak testing.

And could you add a release-note to highlight it's a performance enhancement?

Ok, no problem.

KunWuLuan avatar Apr 12 '24 06:04 KunWuLuan

And postFilter will not check Permit plugin, so implementation of PodAddition and PodDeletion will have no effect on preemption, right?

Yes, the current preemption skeleton code assumes each plugin only use PreFilter to pre-calculate state. But for coscheduling, PreFilter can fail early (upon inadequate quorum).

I think scheduler framework should open up a hook for out-of-tree plugin to choose whether or not to run PreFilter as part of the preemption; otherwise, out-of-tree plugin has to rewrite the PostFilter impl. to hack that part.

I think support preemption for coscheduling is complecated, maybe in another issue. We can determine the expected behaviro for preemption of coscheduling. WDYT?

Let's consolidate all the cases and use a new PR to try to tackle it. Thanks.

Ok. I will try to design a preemption framework in postFilter, and if implementation in postFilter is enough, I will create a new pr to track the kep. Otherwise I will try to open a discuss in kubernetes/scheduling-sigs.

KunWuLuan avatar Apr 12 '24 07:04 KunWuLuan

/cc

ffromani avatar Apr 12 '24 07:04 ffromani

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 14 '24 13:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 13 '24 13:08 k8s-triage-robot

/remove-lifecycle rotten

KunWuLuan avatar Aug 13 '24 13:08 KunWuLuan

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 11 '24 13:11 k8s-triage-robot

/remove-lifecycle stale

KunWuLuan avatar Nov 12 '24 01:11 KunWuLuan

@KunWuLuan could you resolve the conflicts? and it'll be good to be merged afterwards

Huang-Wei avatar Jan 11 '25 21:01 Huang-Wei

/label tide/merge-method-squash

Huang-Wei avatar Jan 11 '25 21:01 Huang-Wei

Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name Link
Latest commit edd5da8aaf7be8037c083025bd14791c63f4e192
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/6784fd8ce61d9c00086b9e79

netlify[bot] avatar Jan 13 '25 11:01 netlify[bot]

@Huang-Wei Hi, I have resolved the conflicts and make the tests passed. Please have a look when you have time. Thanks

KunWuLuan avatar Jan 13 '25 11:01 KunWuLuan

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, KunWuLuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jan 13 '25 18:01 k8s-ci-robot