jobset icon indicating copy to clipboard operation
jobset copied to clipboard

Do not create Jobs and the service when JobSet is suspended

Open mimowo opened this issue 1 year ago • 36 comments

To reproduce just create a simple JobSet with spec.suspend=true. In that case Jobs and the service are created.

This is wasteful in case of Jobs which are queued by Kueue, and may stay in the queue for a long time, potentially.

mimowo avatar Apr 19 '24 15:04 mimowo

/cc @danielvegamyhre @kannon92 @alculquicondor

mimowo avatar Apr 19 '24 15:04 mimowo

I think this could be particularly wasteful when the jobs are rather small, but have a big replication number.

alculquicondor avatar Apr 19 '24 15:04 alculquicondor

/kind feature

kannon92 avatar Apr 19 '24 15:04 kannon92

I know that when I did this, I figured I would just use JobSet.Spec.Suspend and set that on all jobs that are created. Resuming means to resume the individual jobs.

I can see why maybe we would want to go a different route. I tagged this as a feature.

kannon92 avatar Apr 19 '24 15:04 kannon92

Is it a big deal to have the service created?

kannon92 avatar Apr 19 '24 15:04 kannon92

Maybe not a "big" deal, but Kueue is typically used to hold long queues of suspended Jobs(or JobSets), say 50k, so would be nice to do not create them.

I imagine it would be fine to keep a once created service for a JobSet that got suspended (it was running, but got preempted). There should not be too many preemptions, and we could save on recreation in case the JobSet is quickly re-admitted.

mimowo avatar Apr 19 '24 15:04 mimowo

I think the main tricky point would be support for startup policy and suspend.

Our implementation with suspend and startup policy was to resume the replicated jobs in order of their listing.

I guess this could clean this up as we would only create the jobs if they were resumed. But it may be a bit tricky to implement...

kannon92 avatar Apr 19 '24 17:04 kannon92

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 18 '24 18:07 k8s-triage-robot

/remove-lifecycle stale I'm investigating this so that we can fix https://github.com/kubernetes-sigs/jobset/pull/625

mimowo avatar Jul 26 '24 11:07 mimowo

I don't quite follow why this is required for #625. Can you explain?

kannon92 avatar Jul 26 '24 19:07 kannon92

I don't quite follow why this is required for #625. Can you explain?

Consider the following chain of states: suspend - resume 1 - suspend - resume.2.

Here resume 1 and 2 may use different Pod templates (updated during suspend), so we need to recreate the Jobs at some point.

Deleting the in the suspend phase seems simplest. The Job controller also deletes pods on suspend

mimowo avatar Jul 26 '24 19:07 mimowo

I see. So Jobs delete pods on a resume/suspend but JobSet was keeping the Jobs around?

kannon92 avatar Jul 26 '24 19:07 kannon92

Correct, The alternative could be to try to update the Jobs by JobSet but this is rather complex.

First due to mutability constraints in Jobs. It would require multiple requests similarly as we do in Kueue.

Second the update of new Pod template would revert changes to Jobs done by some potential create webhooks which users may have.

mimowo avatar Jul 26 '24 20:07 mimowo

Consider the following chain of states: suspend - resume 1 - suspend - resume.2. Here resume 1 and 2 may use different Pod templates (updated during suspend), so we need to recreate the Jobs at some point.

@mimowo what is the current status of this issue, can we close it? We relaxed pod template mutability constraints in JobSet in order to allow for this integration with Kueue, does this not solve the issue described in your comment above?

danielvegamyhre avatar Oct 05 '24 17:10 danielvegamyhre

Yeah, it solves 98% (guesstimation) of problems. Let me summarize the remaining motivation to still do it (maybe as an extra option not to break the default flow):

  1. transitioning JobSet via the chain (ResourceFlavor1) -> suspend -> resume (ResourceFlavor2) never remove nodeSelectors assigned before (ResourceFlavor1), just override them. This might be a problem if ResourceFlavor1 has nodeSelector: some-key: some-value, but ResourceFlavor2's nodeSelector does not specify the value for some-key. Then, it will still keep the old one in the Pod template, potentially not allowing for kubernetes-level scheduling. However, this is not an issue typically, because the ResourceFlavor2 will typically specify a new value for some-key.

  2. For JobSets with many replicated Jobs it takes a lot of API-server resources even when the JobSet remains in Kueue. (original motivation)

Having said that I'm ok to close it for now, because the remaining issues haven't yet been a problem for any user I know of, so it is not a high priority to solve them proactively.

mimowo avatar Oct 07 '24 07:10 mimowo

IIUC it could also be helpful for lazy quota reservation for Execution Policy (see comment), but maybe the KEP will go another way, so not sure.

mimowo avatar Oct 07 '24 10:10 mimowo