spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

ScheduledSparkApplication controller ignores schedule spec changes, requiring to edit the subresource status.nextRun

Open dk805 opened this issue 7 months ago • 4 comments

What happened?

Description: ScheduledSparkApplication controller ignores schedule spec changes and requires manual status subresource editing to apply new schedules.

Workaround: Edit status subresource and manually set nextRun time to our desired start time. I suppose you could also set scheduledState to "New".

Controller Code Issue: The controller in pkg/controller/scheduledsparkapplication/controller.go lacks spec change detection logic and only recalculates nextRun when status.nextRun is zero. Not sure if this is intended or not for some reason I am not seeing

Possibly this pattern could be helpful? https://alenkacz.medium.com/kubernetes-operator-best-practices-implementing-observedgeneration-250728868792

And then, in the switch case for when the state is already Scheduled, check that the generations match to determine if a recalculation of NextRunTime is in order: https://github.com/kubeflow/spark-operator/blob/master/internal/controller/scheduledsparkapplication/controller.go#L160

If you agree, I am happy to raise a PR

Reproduction Code

  1. Create a ScheduledSparkApplication with schedule "10 * * * *"
  2. Let it reach ScheduleStateScheduled
  3. Update spec.schedule to "15 * * * *", e.g. k edit scheduledsparkapp <app-name> and edit the schedule
  4. Observe that nextRun time is not recalculated

Expected behavior

  • Controller should detect spec changes and recalculate status.nextRun automatically
  • Should implement generation/observedGeneration pattern like other K8s controllers

Actual behavior

  • Editing the main ScheduledSparkApplication spec.schedule field does not trigger recalculation of status.nextRun
  • The controller continues using the old schedule timing stored in status.nextRun
  • Only workaround is manually editing the status subresource

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

dk805 avatar Jun 06 '25 17:06 dk805

I am seeing this as well with v2.2.0 after upgrading from the v1 series of the operator. Similarly if the ScheduledSparkApplication transitions into a "FailedValidation" state, it is stuck and can't be edited in place to fix the validation issue without clearing the status.

The workaround @dk805 does work in the interim, however this would be great to have working again

aolear-ss avatar Jun 18 '25 21:06 aolear-ss

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 16 '25 22:09 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Oct 06 '25 22:10 github-actions[bot]

/reopen this is still an issue, and a really annoying one, you need to delete the resource to have the new schedule times take effect.

surely we can fix this?

BenCoughlan15 avatar Nov 21 '25 10:11 BenCoughlan15