training-operator
training-operator copied to clipboard
Flaky test: [It] should update TFJob with desired status
Chief worker is succeeded
• [FAILED] [0.026 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
Timeline >>
2023-05-31T14:16:30Z INFO testing case {"description": "Chief worker is succeeded"}
2023-05-31T14:16:30Z DEBUG events TFJob default/test-tfjob has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
2023-05-31T14:16:30Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
2023-05-31T14:16:30Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
2023-05-31T14:16:30Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
2023-05-31T14:16:30Z DEBUG events Created service: test-status-0-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
2023-05-31T14:16:30Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
2023-05-31T14:16:30Z DEBUG events Created service: test-status-0-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
2023-05-31T14:16:30Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
2023-05-31T14:16:30Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Worker.template.metadata.creationTimestamp"
2023-05-31T14:16:30Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-05-31T14:16:30Z"}}
[FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
<< Timeline
[FAILED] Expected
<bool>: false
to be true
In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
------------------------------
https://github.com/kubeflow/training-operator/actions/runs/5133950363/jobs/9237255811#step:4:126
Similar flaky test: (No chief worker) Worker is running
------------------------------
• [FAILED] [0.098 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
...
Timeline >>
2023-06-04T17:26:13Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
2023-06-04T17:26:13Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
2023-06-04T17:26:13Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "SuccessfulCreateService"}
2023-06-04T17:26:13Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "TFJobSucceeded"}
2023-06-04T17:26:13Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-06-04T17:26:13Z","lastTransitionTime":"2023-06-04T17:26:13Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-06-04T17:26:13Z","completionTime":"2023-06-04T17:26:13Z"}}
2023-06-04T17:26:13Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
2023-06-04T17:26:13Z INFO testing case {"description": "(No chief worker) Worker is running"}
2023-06-04T17:26:13Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{}}}}
[FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951
<< Timeline
[FAILED] Expected
<bool>: false
to be true
In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951
https://github.com/kubeflow/training-operator/actions/runs/5170338558/jobs/9313174311?pr=1824#step:4:406
should create missing Pods https://github.com/kubeflow/training-operator/actions/runs/5312480601/jobs/9617036307?pr=1837
@lowang-bh Thanks for reporting that. However, that case doesn't seem to be similar to this test. So I created another issue.
https://github.com/kubeflow/training-operator/issues/1838
Similar flaky test:
------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
Timeline >>
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is succeeded"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-tfjob has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-0 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-0", "job description": "Chief worker is succeeded"}
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is running"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created pod: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-1", "job description": "Chief worker is running"}
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created pod: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-2 has failed because 1 Chief replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-2", "job description": "Chief worker is failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-3-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-3 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is succeeded"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is running"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-5-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-7 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-8 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-9-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-9 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-10-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-11-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-11 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-12-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-12 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are failed"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-13 has failed because 4 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are succeeded"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
[FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
<< Timeline
[FAILED] Expected
<bool>: false
to be true
In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------
https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345
/assign
I couldn't reproduce these failures on my local with the following command:
KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" \
./bin/ginkgo --until-it-fails -v ./pkg/controller.v1/tensorflow/...
I guess these tests take a bit of time. So we should increase the time until timeout for gomega.Eventually
.
Because CI env has much-limited computing resources.
The default time is 1s until timeout for the gomega.Eventually
.
That is a good point
That is a good point
I'm working on this improvement.
Only this is re-occurred:
- https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173825#step:4:510
- https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173661#step:4:831
/reopen
Similar flaky test:
------------------------------ • [FAILED] [0.531 seconds] TFJob controller Test Status [It] should update TFJob with desired status /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76 Timeline >> 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is succeeded"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-tfjob has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-0 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp" 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-0", "job description": "Chief worker is succeeded"} 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is running"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created pod: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-1", "job description": "Chief worker is running"} 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created pod: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-2 has failed because 1 Chief replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-2", "job description": "Chief worker is failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-3-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-3 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is succeeded"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is running"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-5-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-5", "job description": "(No chief worker) Worker is running"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp" 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are running, 2 workers are failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-7 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-8 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-9-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-9 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-10-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-11-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-11 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-12-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-12 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are failed"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-13 has failed because 4 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-13", "job description": "Chief is running, workers are failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are succeeded"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}} [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286 << Timeline [FAILED] Expected <bool>: false to be true In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286 ------------------------------
https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345
@tenzen-y: Reopened this issue.
In response to this:
Only this is re-occurred:
https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173825#step:4:510
/reopen
Similar flaky test:
------------------------------ • [FAILED] [0.531 seconds] TFJob controller Test Status [It] should update TFJob with desired status /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76 Timeline >> 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is succeeded"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-tfjob has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-0 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"} 2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp" 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-0", "job description": "Chief worker is succeeded"} 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is running"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created pod: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-1", "job description": "Chief worker is running"} 2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created pod: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-2 has failed because 1 Chief replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-2", "job description": "Chief worker is failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-3-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-3 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is succeeded"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is running"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-5-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-5", "job description": "(No chief worker) Worker is running"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"} 2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp" 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are running, 2 workers are failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-7 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"} 2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}} 2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"} 2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-8 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-9-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-9 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-10-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-11-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-11 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"} 2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-12-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-12 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are failed"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"} 2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"} 2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-13 has failed because 4 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}} 2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-13", "job description": "Chief is running, workers are failed"} 2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are succeeded"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"} 2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}} [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286 << Timeline [FAILED] Expected <bool>: false to be true In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286 ------------------------------
https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
It seems that otherwise are resolved.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen