kubebench
kubebench copied to clipboard
kubeflow job monitor/waiter
we need a monitor function in the kubebench controller image that polls status of deployed kubeflow jobs until the desired status (success/fail/etc.) is met. the monitor will be run in the "wait for job finish" step in #50. It will take a job manifest (generated in the previous step), a success/failure condition, and an optional timeout as input, and return error if failure condition or timeout is met. we might take argo's resource execution functionality as a reference (which contains polling of k8s resource status but with a fixed timeout, which causes #17).
/priority p1
/assign @ramdootp
https://github.com/argoproj/argo/issues/702 https://github.com/argoproj/argo/issues/763
argo's event management support would be needed for a cleaner solution.
Precisely this might help- https://github.com/argoproj/argo/issues/763#issuecomment-381805052