benchmark-runner
benchmark-runner copied to clipboard
wait_for_pod_completed does not detect pod going into error state
If a pod goes into error state, wait_for_pod_completed does not detect that and hangs until timeout:
Method name: wait_for_initialized {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22
Method name: wait_for_initialized , End time: 2021-12-09 09:19:22 , Total time: 0.12 sec
Method name: wait_for_pod_completed {'label': 'app=system-metrics-collector', 'workload': 'uperf-vm'} , Start time: 2021-12-09 09:19:22
But the pod has gone into an error state:
# oc logs -n benchmark-operator system-metrics-collector-2d2e4517-t6dd2
time="2021-12-09 14:22:47" level=info msg="📁 Creating indexer: elastic"
time="2021-12-09 14:24:57" level=fatal msg="ES health check failed: dial tcp XXX.XXX.XXX.XXX:YYYYY: connect: connection timed out"
The failure in this case is external (my Elasticsearch server isn't running correctly), but the pod failure should be detected and it should error out.