volcano icon indicating copy to clipboard operation
volcano copied to clipboard

[Proposal] Implementing an Automatic Re-Queuing and Status Transition Mechanism

Open flyingfang opened this issue 5 months ago • 7 comments

What is the problem you're trying to solve

We can implement a mechanism that supports automatically re-queuing tasks that have entered the "Inqueue" status but remain unscheduled for an extended period of time. These tasks would then be transitioned to the “Pending” status. This will help optimize the queue’s efficiency and ensure that tasks are processed in a timely manner.

When tasks in the scheduling queue remain unscheduled for an extended period due to factors such as affinity or resource fragmentation, they are occupying quota while blocking subsequent jobs from entering the queue. The system can automatically release these resources to improve the allocation rate of the queue. This ensures that the system can more efficiently utilize its resources and process jobs in a timely manner.

Describe the solution you'd like

We can add a requeue action,to incorporate a re-enqueue process. The main process in the entire action is to traverse all jobs in Pending and Inqueue status, as well as jobs with pending tasks. By a registerable function ssn.JobRequeueable, we determine whether the job needs to be re-queued. Next, we will traverse the tasks that need to be re-queued, and call another registerable function ssn.JobRequeue, to implement the re-queueing of the tasks.

// Session information for the current session
type Session struct {
	UID types.UID
         // ....
	requeueableFns               map[string]api.VoteFn
	jobRequeueFns                map[string]api.JobRequeueFn
         // ....
}

type JobRequeueFn func(*JobInfo) error
type VoteFn func(interface{}) int

Additional context

No response

flyingfang avatar Sep 11 '24 02:09 flyingfang