benchmark-runner wait_for_pod_completed does not detect pod going into error state

wait_for_pod_completed does not detect pod going into error state

Open RobertKrawitz opened this issue 3 years ago • 1 comments

If a pod goes into error state, wait_for_pod_completed does not detect that and hangs until timeout:

Method name: wait_for_initialized {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22 
Method name: wait_for_initialized , End time: 2021-12-09 09:19:22 , Total time: 0.12 sec
Method name: wait_for_pod_completed {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22

But the pod has gone into an error state:

# oc logs -n benchmark-operator system-metrics-collector-2d2e4517-t6dd2
time="2021-12-09 14:22:47" level=info msg="📁 Creating indexer: elastic"
time="2021-12-09 14:24:57" level=fatal msg="ES health check failed: dial tcp XXX.XXX.XXX.XXX:YYYYY: connect: connection timed out"

Dec 09 '21 15:12 RobertKrawitz

The failure in this case is external (my Elasticsearch server isn't running correctly), but the pod failure should be detected and it should error out.

Dec 09 '21 15:12 RobertKrawitz

benchmark-runner benchmark-runner copied to clipboard

wait_for_pod_completed does not detect pod going into error state

benchmark-runner
benchmark-runner copied to clipboard