kubebench
kubebench copied to clipboard
"step-run" fails during long benchmark run
This is caused by using "successCondition" in the Argo step to track the status of the created kubeflow resources (tfjob), which causes the step to timeout in a few minutes if the "successCondition" is not met. However the kubeflow resources running benchmarks might take longer time than the step can wait. We need a more proper way to track the status of the created kubeflow resources.
/priority p2
is this still an open issue and is this being handled?