scheduler-plugins icon indicating copy to clipboard operation
scheduler-plugins copied to clipboard

Co-Scheduling: it's better to use the time of first pod in pg arrived SchedulingQueue

Open NoicFank opened this issue 3 years ago • 6 comments

Instead of using CreationTimestamp of pg, it's better to use the time of first pod in pg arrived SchedulingQueue? as creating podGroup and creating pod to schedule is async.

// GetCreationTimestamp returns the creation time of a podGroup or a pod.
func (pgMgr *PodGroupManager) GetCreationTimestamp(pod *corev1.Pod, ts time.Time) time.Time {
	pgName := util.GetPodGroupLabel(pod)
	if len(pgName) == 0 {
		return ts
	}
	pg, err := pgMgr.pgLister.PodGroups(pod.Namespace).Get(pgName)
	if err != nil {
		return ts
	}
	return pg.CreationTimestamp.Time
}

In the following example:

  • pg1 is created before pg2
  • there are pod1-0, pod1-1 in pogGroup pg1, and pod2-0,pod2-1 in pg2, then:
  1. firstly: pod2-0 & pod2-1 arrived.
  • Queue: pod2-1,pod2-0 ---------> priority
  1. then: consume first pod in queue pod2-0
  • Queue: pod2-1
  • Processing: pod2-0
  1. then: pod1-0,pod1-1 arrived
  • Queue: pod2-1,pod1-1,pod1-0 (since pg1 is created before pg2)
  • Processing: pod2-0
  1. then: consume first pod in queue pod1-0, where we expect its pod2-1.
  • Queue: pod2-1,pod1-1
  • Processing: pod1-0

where we expect the situation is:

  • Queue: pod1-1,pod1-0
  • Processing: pod2-1

NoicFank avatar Apr 24 '22 10:04 NoicFank

Theoretically, I think your proposal is only better in some case. It's not necessarily the PodGroup that has early-arriving pod should be always prioritized than a PodGroup that has later-arriving pod.

A good solution is to impose extra semantics to the PodGroup, like queue. I'm proposing a refined PodGroup API in https://docs.google.com/document/d/16yLngEOd6x3IS6ejkclrCdmkDYfxtTyr97a7uNug_wA/view, and pursuing to make it available in k/k.

Huang-Wei avatar Apr 25 '22 21:04 Huang-Wei

yeah, for sure. Access to the document is denied. Please apply through my access request, thanks.

NoicFank avatar Apr 26 '22 02:04 NoicFank

Before the proposed PodGroup semantics are available, I think custom and configurable sorting strategies, including the proposed one here, might be helpful. It can be configured via coscheduling plugin config.

  • Default sorting
  • Sorting based on the first pod's timestamp
  • Prioritize pods in a pod group that has reservation and is waiting for additional pods.

yuanchen8911 avatar May 10 '22 18:05 yuanchen8911

Thanks for the advices, I will see about this.

NoicFank avatar May 23 '22 02:05 NoicFank

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 21 '22 03:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 20 '22 03:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 20 '22 03:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 20 '22 03:10 k8s-ci-robot