kubeface
kubeface copied to clipboard
expose a way to understand what tasks have failed
Currently our notions of the state a task can be are:
- completed (detected by presence of the result file in the bucket)
- submitted (we submitted this task but don't see a result for it currently)
- reused (from a previous run with the same cache-key)
There is no notion of 'submitted but failed' vs. 'submitted and still running'
It might be useful to expose this information in KubernetesBackend. One simple implementation would be to search the output of kubectl get pods for the task name: if it's found, then it hasn't failed. If it's missing and there is no result for the task in the bucket, then it must have failed.