Volcano Scheduler clutters logs and make troubleshooting more difficult
Description
I've noticed the volcano-scheduler pod keeps throwing warnings such as "Failed to delete task" for pods using the default-scheduler.
These warnings are generally harmless for cluster operation. They do not prevent either Volcano-managed or standard workloads from running, but they can clutter logs and make troubleshooting more difficult.
I couldn't find documentation to filter the log noise.
Steps to reproduce the issue
- Install Volcano; default values are ok
- Create a namespace
testand a dummy deployment "test", without setting the volcano scheduler. Use nginx for example. - Verify that
schedulerName: default-scheduler - Tail the volcano-scheduler logs
Describe the results you received and expected
Example log:
volcano-scheduler-5459dc465b-9bpm4:volcano-scheduler W0617 15:07:28.259063 1 event_handlers.go:364] Failed to delete task: errors: 1: task test/test-c5b7c89dc has null jobID
My understanding:
Volcano's scheduler expects to manage pods created by Volcano Jobs, which have an associated JobID. When it encounters pods (such as those from Deployments, StatefulSets, or other controllers not using Volcano) without this association, it logs a warning because it cannot find a Volcano Job context for the pod.
My expectations:
Being able to silence the warning. Otherwise troubleshooting becomes very difficult when NOT using Volcano as the solely scheduler.
What version of Volcano are you using?
1.12.1
Any other relevant information
No response
I have also noticed that, thanks, It would be great if you could help us solve this problem :) BTW, does only v1.12.1 version cause this problem?
If the pod is finally scheduled by default-scheduler, then there will be an update event, and the volcano scheduler will re-add pod into cache(delete fisrt, then add to cache), so there is a log like this.
I have also noticed that, thanks, It would be great if you could help us solve this problem :) BTW, does only v1.12.1 version cause this problem?
I'm evaluating Volcano at the moment; latest version only.
I'll give it a look to the source and evaluate what possibilities do we have to filter logs.
If the pod is finally scheduled by default-scheduler, then there will be an update event, and the volcano scheduler will re-add pod into cache(delete fisrt, then add to cache), so there is a log like this.
That's completely fine to me, as long as we can silence that log out.
/assign @hajnalmt
I tracked down some of these caching bugs. ☺️ I wait for a review by the community.