kubebench icon indicating copy to clipboard operation
kubebench copied to clipboard

kubeflow job monitor/waiter

Open xyhuang opened this issue 6 years ago • 4 comments

we need a monitor function in the kubebench controller image that polls status of deployed kubeflow jobs until the desired status (success/fail/etc.) is met. the monitor will be run in the "wait for job finish" step in #50. It will take a job manifest (generated in the previous step), a success/failure condition, and an optional timeout as input, and return error if failure condition or timeout is met. we might take argo's resource execution functionality as a reference (which contains polling of k8s resource status but with a fixed timeout, which causes #17).

xyhuang avatar Aug 07 '18 04:08 xyhuang

/priority p1

xyhuang avatar Aug 07 '18 04:08 xyhuang

/assign @ramdootp

xyhuang avatar Oct 08 '18 08:10 xyhuang

https://github.com/argoproj/argo/issues/702 https://github.com/argoproj/argo/issues/763

argo's event management support would be needed for a cleaner solution.

ramdootp avatar Nov 02 '18 09:11 ramdootp

Precisely this might help- https://github.com/argoproj/argo/issues/763#issuecomment-381805052

ramdootp avatar Nov 02 '18 09:11 ramdootp