pytorch-operator icon indicating copy to clipboard operation
pytorch-operator copied to clipboard

cleanPodPolicy Set to Running should clean Running pod

Open xrmzju opened this issue 4 years ago • 5 comments

https://github.com/kubeflow/pytorch-operator/blob/047cf0f41e68e030158f532017a226c18827a660/pkg/controller.v1/pytorch/job.go#L160 we just ignore running policy for now

xrmzju avatar Mar 10 '20 06:03 xrmzju

Issue-Label Bot is automatically applying the labels:

Label Probability
bug 0.57

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Mar 10 '20 06:03 issue-label-bot[bot]

@gaocegege

xrmzju avatar Mar 10 '20 06:03 xrmzju

When the PyTorchJob is failed, all replicas should be failed. Then there is no difference between none and running. Then we ignore it. Do you have problem with it?

/cc @johnugeorge

gaocegege avatar Mar 10 '20 08:03 gaocegege

When the PyTorchJob is failed, all replicas should be failed. Then there is no difference between none and running. Then we ignore it. Do you have problem with it?

/cc @johnugeorge

but in my condition it seems not like this...

pytorch-test-master-0                                               1/1     Running                0          4m15s
pytorch-test-worker-0                                               0/1     Error                  0          4m15s
pytorch-test-worker-1                                               0/1     Error                  0          4m15s

xrmzju avatar Mar 11 '20 06:03 xrmzju

@johnugeorge WDYT

gaocegege avatar Mar 11 '20 07:03 gaocegege