jobset icon indicating copy to clipboard operation
jobset copied to clipboard

Jobset minMember support

Open song-william opened this issue 1 year ago • 9 comments

What would you like to be added: Does jobset support a concept like minMember in podgroups? We would like pods within a replicatedjob to be scheduled only if resources are available for all pods in the replicated jobs.

Why is this needed:

We have seen workloads that required >=2 pods only schedule one pod at first (e.g multinode pytorch). The scheduled pod then timesout waiting for the other pods to schedule.

song-william avatar Jul 22 '24 17:07 song-william

/cc

googs1025 avatar Jul 23 '24 01:07 googs1025

We would like pods within a replicatedjob to be scheduled only if resources are available for all pods in the replicated jobs.

@song-william for capacity-aware group scheduling behavior like this, we recommend using Kueue. JobSets are a natively supported workload type in Kueue. Here is an example of how to run a JobSet scheduled/managed via Kueue.

Here is another more involved example which shows step-by-step how to run large scale TPU Multislice training workloads as JobSets managed by Kueue, including step by step instructions for how to configure Kueue properly based on the actual accelerator resources available in the cluster.

danielvegamyhre avatar Jul 24 '24 00:07 danielvegamyhre

Closing for now since there seems to be no follow up question. Feel free to re-open if you want to discuss this further.

danielvegamyhre avatar Jul 29 '24 16:07 danielvegamyhre

@danielvegamyhre We will be leveraging kueue on our cluster soon. Thanks for the response!

With the kueue installation, will kueue guarantee that pods are properly gang-scheduled (e.g minMember behavior)?

song-william avatar Jul 31 '24 15:07 song-william

@danielvegamyhre it seems I don't have the permissions to reopen issues. https://stackoverflow.com/a/21333938

song-william avatar Jul 31 '24 17:07 song-william

With the kueue installation, will kueue guarantee that pods are properly gang-scheduled (e.g minMember behavior)?

Perhaps we should move this issue to the kueue project.

googs1025 avatar Aug 01 '24 00:08 googs1025

FWIW, we have some simpler clusters where the cluster owners are only interested in proper gang-scheduling (e.g minMember) without the need for full quota controls (e.g kueue). I would have expected jobsets be able to handle this primitive without requiring a full queue/quota system installed.

song-william avatar Aug 01 '24 17:08 song-william

We would like pods within a replicatedjob to be scheduled only if resources are available for all pods in the replicated jobs.

@song-william This is only possible if you implement some form of capacity aware, all-or-nothing scheduling. This is a fairly complicated endeavor, and is applicable to more batch workload types than just JobSet. Therefore, our thinking was it makes more sense for this feature to live in Kueue, which is agnostic to the workload type, and therefore 1 implementation of gang-scheduling can support any batch workload submitted via Kueue.

However, I do understand the hesitancy add a new, complex dependency into your stack. Maybe we can think about if it makes sense to support some simple form of gang-scheduling in JobSet for cases like this. cc @alculquicondor

danielvegamyhre avatar Aug 08 '24 21:08 danielvegamyhre

Capacity awareness is not (and shouldn't be) a concern of the jobset project. This should be achieved by Kueue or other schedulers.

alculquicondor avatar Aug 13 '24 17:08 alculquicondor

@danielvegamyhre We will be leveraging kueue on our cluster soon. Thanks for the response!

Sounds good, closing this issue for now. Feel free to tag me if you have any follow up questions. For now we have no plans to implement gang-scheduling in JobSet itself.

danielvegamyhre avatar Oct 05 '24 17:10 danielvegamyhre