volcano
volcano copied to clipboard
optimizate jobflow controller to reduce invalid reconcile
I found a some bit err message when using jobflow feature, I create a jobflow resource ref: https://github.com/volcano-sh/volcano/blob/master/example/jobflow/JobFlow.yaml https://github.com/volcano-sh/volcano/blob/master/example/jobflow/JobTemplate.yaml
here's controller manager logs:
[root@master01 ~]# kubectl logs -n volcano-system volcano-controllers-744bc4796d-jbncj | grep ^E
E0425 10:34:49.690189 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:34:49.707411 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:34:50.321009 1 queue_controller_action.go:85] Failed to update status of Queue default: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again.
E0425 10:34:51.395417 1 queue_controller_action.go:85] Failed to update status of Queue default: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again.
E0425 10:35:04.721574 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:35:04.736015 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:35:05.568771 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:35:05.581852 1 jobflow_controller_action.go:69] Failed to update status of JobFlow default/test: Operation cannot be fulfilled on jobflows.flow.volcano.sh "test": the object has been modified; please apply your changes to the latest version and try again
E0425 10:35:20.711708 1 queue_controller_action.go:85] Failed to update status of Queue default: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again.
E0425 10:35:21.731150 1 queue_controller_action.go:85] Failed to update status of Queue default: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again.
E0425 10:35:34.692296 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-b, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-b> is not ready
E0425 10:35:34.695945 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-b, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-b> is not ready
E0425 10:35:34.698687 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-c, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-c> is not ready
E0425 10:35:34.701790 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-c, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-c> is not ready
E0425 10:35:34.707817 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-d, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-d> is not ready
E0425 10:35:34.712693 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-d, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-d> is not ready
E0425 10:35:34.714371 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-e, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-e> is not ready
E0425 10:35:34.715187 1 jobflow_controller_action.go:300] Failed to delete job of JobFlow default/test: jobs.batch.volcano.sh "test-a" not found
E0425 10:35:34.715210 1 jobflow_controller_action.go:46] Failed to delete jobs of JobFlow default/test: jobs.batch.volcano.sh "test-a" not found
E0425 10:35:34.717377 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-e, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-e> is not ready
E0425 10:35:34.723456 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-a, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: job <default/test-a> is not ready
E0425 10:35:34.728548 1 job_controller.go:334] Failed to get job by <Queue: , Job: default/test-a, Task:default-nginx, Event:PodEvicted, ExitCode:0, Action:, JobVersion: 0> from cache: failed to find job <default/test-a>
The pr focuses only on jobflow_controllers.go errors.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign shinytang6
You can assign the PR to them by writing /assign @shinytang6 in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/auto-cc
@lowang-bh @hwdef PTAL