volcano icon indicating copy to clipboard operation
volcano copied to clipboard

gang plugin's jobStarving behavior is not expected

Open zhaixigui opened this issue 2 years ago • 5 comments

What happened: Enabled preemption, enabled gang plugin When the queue resources are exhausted, a high priority job is submitted to this queue, we expect it to preempt the low priority job, unfortunately the preemption fails.

Reason: When checking whether a job is hungry, the CheckTaskMinAvailablePipelined of gang.jobStarvingFn will check the number of pipelined pods, and if the number is less than the TaskMinAvailable, the job is not starved.

plugins/gang/gang.go

jobStarvingFn := func(obj interface{}) bool {
		ji := obj.(*api.JobInfo)
		occupied := ji.WaitingTaskNum() + ji.ReadyTaskNum()
		if ji.CheckTaskMinAvailablePipelined() && occupied < ji.MinAvailable {
			return true
		}
		return false
	}
	ssn.AddJobStarvingFns(gp.Name(), jobStarvingFn)

api/job_info.go

// CheckTaskMinAvailablePipelined return ready pods meet task minavaliable.
func (ji *JobInfo) CheckTaskMinAvailablePipelined() bool {
	if ji.MinAvailable < ji.TaskMinAvailableTotal {
		return true
	}
	occupiedMap := map[TaskID]int32{}
	for status, tasks := range ji.TaskStatusIndex {
		if AllocatedStatus(status) ||
			status == Succeeded ||
			status == Pipelined {
			for _, task := range tasks {
				occupiedMap[getTaskID(task.Pod)]++
			}
			continue
		}

		if status == Pending {
			for _, task := range tasks {
				if task.InitResreq.IsEmpty() {
					occupiedMap[getTaskID(task.Pod)]++
				}
			}
		}
	}
	for taskID, minNum := range ji.TaskMinAvailable {
		if occupiedMap[taskID] < minNum {
			klog.V(4).Infof("Job %s/%s Task %s occupied %v less than task min avaliable", ji.Namespace, ji.Name, taskID, occupiedMap[taskID])
			return false
		}
	}
	return true
}
I0523 10:58:45.907498       1 preempt.go:41] Enter Preempt ...
I0523 10:58:45.907502       1 job_info.go:712] job vc-high-job-10002/prajna actual: map[nginx:5], ji.TaskMinAvailable: map[nginx:5]
I0523 10:58:45.907516       1 preempt.go:63] Added Queue <10002> for Job <prajna/vc-high-job-10002>
I0523 10:58:45.907553       1 job_info.go:783] Job prajna/vc-high-job-10002 Task nginx occupied 0 less than task min avaliable
I0523 10:58:45.907559       1 preempt.go:63] Added Queue <10003> for Job <prajna/vc-low-test-1003>
I0523 10:58:45.907577       1 preempt.go:63] Added Queue <10001> for Job <prajna/vc-low-job>
I0523 10:58:45.907593       1 job_info.go:712] job vc-low-job-10002-10/prajna actual: map[nginx:10], ji.TaskMinAvailable: map[nginx:1]
I0523 10:58:45.907610       1 preempt.go:100] No preemptors in Queue <10003>, break.
I0523 10:58:45.907614       1 preempt.go:100] No preemptors in Queue <10001>, break.
I0523 10:58:45.907617       1 preempt.go:100] No preemptors in Queue <10002>, break.
I0523 10:58:45.907621       1 statement.go:381] Committing operations ...
I0523 10:58:45.907624       1 preempt.go:225] Leaving Preempt ...

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

zhaixigui avatar May 23 '22 13:05 zhaixigui

any update ?

zhaixigui avatar May 30 '22 07:05 zhaixigui

@shinytang6 could you help on this issue?

I am not very clear why introduce ji.MinAvailable < ji.TaskMinAvailableTotal in CheckTaskMinAvailablePipelined. In high priority job, if job.MinAvailable=1 and taks.replicas=1 because occupied is 0 this will block preemption in gang. But if job.MinAvailable=1 and taks.replicas=2 can trigger preempt low priority job.

wpeng102 avatar Jun 01 '22 00:06 wpeng102

@shinytang6 could you help on this issue?

I am not very clear why introduce ji.MinAvailable < ji.TaskMinAvailableTotal in CheckTaskMinAvailablePipelined. In high priority job, if job.MinAvailable=1 and taks.replicas=1 because occupied is 0 this will block preemption in gang. But if job.MinAvailable=1 and taks.replicas=2 can trigger preempt low priority job.

l looked into it, l think the logic in jobStarvingFn is not right, l think we need to write a new func to handle this.

the reason why introduce ji.MinAvailable < ji.TaskMinAvailableTotal is because if sumof(task.minAvailable) > job.minAvailable, we need to ignore the logical judgment related to task.minAvailable, because in this case if we dont ignore it will make the judgment of job.minAvailable meaningless.

shinytang6 avatar Jun 01 '22 02:06 shinytang6

btw, l can submit a pr to fix it.

shinytang6 avatar Jun 01 '22 02:06 shinytang6

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Sep 08 '22 22:09 stale[bot]