volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Volcano Scheduler clutters logs and make troubleshooting more difficult

Open cmontemuino opened this issue 7 months ago • 2 comments

Description

I've noticed the volcano-scheduler pod keeps throwing warnings such as "Failed to delete task" for pods using the default-scheduler.

These warnings are generally harmless for cluster operation. They do not prevent either Volcano-managed or standard workloads from running, but they can clutter logs and make troubleshooting more difficult.

I couldn't find documentation to filter the log noise.

Steps to reproduce the issue

  1. Install Volcano; default values are ok
  2. Create a namespace test and a dummy deployment "test", without setting the volcano scheduler. Use nginx for example.
  3. Verify that schedulerName: default-scheduler
  4. Tail the volcano-scheduler logs

Describe the results you received and expected

Example log:

volcano-scheduler-5459dc465b-9bpm4:volcano-scheduler W0617 15:07:28.259063       1 event_handlers.go:364] Failed to delete task: errors:  1: task test/test-c5b7c89dc has null jobID

My understanding:

Volcano's scheduler expects to manage pods created by Volcano Jobs, which have an associated JobID. When it encounters pods (such as those from Deployments, StatefulSets, or other controllers not using Volcano) without this association, it logs a warning because it cannot find a Volcano Job context for the pod.

My expectations:

Being able to silence the warning. Otherwise troubleshooting becomes very difficult when NOT using Volcano as the solely scheduler.

What version of Volcano are you using?

1.12.1

Any other relevant information

No response

cmontemuino avatar Jun 17 '25 15:06 cmontemuino

I have also noticed that, thanks, It would be great if you could help us solve this problem :) BTW, does only v1.12.1 version cause this problem?

JesseStutler avatar Jun 18 '25 06:06 JesseStutler

If the pod is finally scheduled by default-scheduler, then there will be an update event, and the volcano scheduler will re-add pod into cache(delete fisrt, then add to cache), so there is a log like this.

Monokaix avatar Jun 18 '25 06:06 Monokaix

I have also noticed that, thanks, It would be great if you could help us solve this problem :) BTW, does only v1.12.1 version cause this problem?

I'm evaluating Volcano at the moment; latest version only.

I'll give it a look to the source and evaluate what possibilities do we have to filter logs.

cmontemuino avatar Jun 20 '25 06:06 cmontemuino

If the pod is finally scheduled by default-scheduler, then there will be an update event, and the volcano scheduler will re-add pod into cache(delete fisrt, then add to cache), so there is a log like this.

That's completely fine to me, as long as we can silence that log out.

cmontemuino avatar Jun 20 '25 06:06 cmontemuino

/assign @hajnalmt

hajnalmt avatar Sep 26 '25 12:09 hajnalmt

I tracked down some of these caching bugs. ☺️ I wait for a review by the community.

hajnalmt avatar Oct 10 '25 12:10 hajnalmt