training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Flaky test: [It] should update TFJob with desired status

Open tenzen-y opened this issue 1 year ago • 14 comments

Chief worker is succeeded

• [FAILED] [0.026 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-05-31T14:16:30Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-05-31T14:16:30Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
  2023-05-31T14:16:30Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-05-31T14:16:30Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Worker.template.metadata.creationTimestamp"
  2023-05-31T14:16:30Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-05-31T14:16:30Z"}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5133950363/jobs/9237255811#step:4:126

tenzen-y avatar May 31 '23 17:05 tenzen-y

Similar flaky test: (No chief worker) Worker is running

------------------------------
• [FAILED] [0.098 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
...
  Timeline >>
  2023-06-04T17:26:13Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
  2023-06-04T17:26:13Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
  2023-06-04T17:26:13Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "SuccessfulCreateService"}
  2023-06-04T17:26:13Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "TFJobSucceeded"}
  2023-06-04T17:26:13Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-06-04T17:26:13Z","lastTransitionTime":"2023-06-04T17:26:13Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-06-04T17:26:13Z","completionTime":"2023-06-04T17:26:13Z"}}
  2023-06-04T17:26:13Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-06-04T17:26:13Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-06-04T17:26:13Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951

https://github.com/kubeflow/training-operator/actions/runs/5170338558/jobs/9313174311?pr=1824#step:4:406

tenzen-y avatar Jun 04 '23 17:06 tenzen-y

should create missing Pods https://github.com/kubeflow/training-operator/actions/runs/5312480601/jobs/9617036307?pr=1837

lowang-bh avatar Jun 20 '23 08:06 lowang-bh

@lowang-bh Thanks for reporting that. However, that case doesn't seem to be similar to this test. So I created another issue.

https://github.com/kubeflow/training-operator/issues/1838

tenzen-y avatar Jun 20 '23 18:06 tenzen-y

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

tenzen-y avatar Jul 03 '23 16:07 tenzen-y

/assign

tenzen-y avatar Jul 03 '23 17:07 tenzen-y

I couldn't reproduce these failures on my local with the following command:

KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" \
	./bin/ginkgo --until-it-fails -v ./pkg/controller.v1/tensorflow/...

I guess these tests take a bit of time. So we should increase the time until timeout for gomega.Eventually. Because CI env has much-limited computing resources.

tenzen-y avatar Jul 03 '23 17:07 tenzen-y

The default time is 1s until timeout for the gomega.Eventually.

tenzen-y avatar Jul 03 '23 17:07 tenzen-y

That is a good point

johnugeorge avatar Jul 03 '23 18:07 johnugeorge

That is a good point

I'm working on this improvement.

tenzen-y avatar Jul 03 '23 18:07 tenzen-y

Only this is re-occurred:

  • https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173825#step:4:510
  • https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173661#step:4:831

/reopen

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

tenzen-y avatar Jul 04 '23 05:07 tenzen-y

@tenzen-y: Reopened this issue.

In response to this:

Only this is re-occurred:

https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173825#step:4:510

/reopen

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jul 04 '23 05:07 google-oss-prow[bot]

It seems that otherwise are resolved.

tenzen-y avatar Jul 04 '23 05:07 tenzen-y

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 02 '23 10:10 github-actions[bot]

/lifecycle frozen

tenzen-y avatar Oct 02 '23 14:10 tenzen-y