common icon indicating copy to clipboard operation
common copied to clipboard

feat: Add successpolicy

Open gaocegege opened this issue 3 years ago • 9 comments

SuccessPolicy is used in both PyTorchJob and TFJob. Thus I propose to add it in common.

Signed-off-by: cegao [email protected]

gaocegege avatar Dec 09 '21 07:12 gaocegege

/assign @terrytangyuan @Jeffwan @zw0610

gaocegege avatar Dec 09 '21 07:12 gaocegege

It is used in Katib. I think it works, but I think we should support successPolicy to keep API consistency.

cc @andreyvelich

gaocegege avatar Dec 10 '21 02:12 gaocegege

Yes, we are using successCondition with GSON format in our APIs to define condition for Katib Trial's Workers, similar to Argo Workflows. Probably, we can think how to use it in Training Operators.

andreyvelich avatar Dec 10 '21 08:12 andreyvelich

Let's use https://github.com/kubeflow/training-operator/issues/1507 to track and discuss separately.

terrytangyuan avatar Dec 10 '21 16:12 terrytangyuan

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Dec 10 '21 16:12 google-oss-prow[bot]

/assign @zw0610 @Jeffwan

gaocegege avatar Dec 14 '21 03:12 gaocegege

LGTM. Meanwhile, could you add descriptions for SchedulingPolicy and SuccessPolicyAllWorkers to explain the expected behavior?

zw0610 avatar Dec 14 '21 03:12 zw0610

SGTM

gaocegege avatar Dec 14 '21 03:12 gaocegege

/hold

gaocegege avatar Dec 14 '21 03:12 gaocegege