volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Set user customized wait-timeout-seconds for PodGroup based gang scheduling protocol.

Open SimonCqk opened this issue 3 years ago • 6 comments

What would you like to be added:

Introduce a new filed in podGroup.spec named with waitTimeoutSeconds or sth similar, then users are able to configure waitTimeoutSeconds dynamically.

Why is this needed:

As it described in resource-reservation design doc, this feature is in TODO list, however I can not find a related filed in latest API definition, it helps to balance between large-job-starving anomaly and block too many tasks due to resource reservation, we can scale maximum-reserve-time by job replicas or total requested resource.

SimonCqk avatar Jan 25 '22 10:01 SimonCqk

Hey, I guess what you want is SLA ensurance. If that, you can take a look at SLA plugin.

Thor-wl avatar Jan 25 '22 12:01 Thor-wl

@Thor-wl hi, thanks for replying, it seems that SLA plugin implements the semantics I present above, what make me curious is it coupled with Batch Volcano Job api ? What if a user submit a job with other api-group along with a podgroup(represents a gang entity), how can volcano guarantees its SLA ?

SimonCqk avatar Jan 25 '22 12:01 SimonCqk

@Thor-wl hi, thanks for replying, it seems that SLA plugin implements the semantics I present above, what make me curious is it coupled with Batch Volcano Job api ? What if a user submit a job with other api-group along with a podgroup(represents a gang entity), how can volcano guarantees its SLA ?

Yes, this abilitiy is bind to Volcano Job currently. @jiangkaihua Is there any plan to ensure SLA for other workloads?

Thor-wl avatar Jan 26 '22 03:01 Thor-wl

Yes, I have proposed PR #1961 to solve it. When a user submitted a job with other api-podgroup like replicaset, daemonset, etc., k8s would create pods first, then invoke volcano to create podgroup for the pods. So podgroup created from k8s pods would miss annotations of origin workloads, causing configurations inserted in the form of annotations neglected, like #1901 .

So my solution is to fetch annotations from upper resources by searching pod ownerReferences, and filled in podgroup annotations, so that configurations in annotations would be available for jobs with other api-podgroups.

jiangkaihua avatar Jan 26 '22 07:01 jiangkaihua

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Apr 27 '22 08:04 stale[bot]

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Jul 30 '22 18:07 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Oct 01 '22 00:10 stale[bot]