fix: Fixed an issue which causes random scheduling of playbooks
Problem
For cron-scheduled jobs, the scheduler used last_exec_time_sec to compute the next run. This could be stale (e.g., after restarts, long delays, or missed executions), leading to:
- Negative delays: last_exec_time_sec + next_delay - current_time could be negative, causing immediate or incorrect execution.
- Incorrect timing: the next run was based on a past timestamp instead of the current time.
This PR solves the following issues:
- Execution of all scheduled playbooks 120s after Robusta restart
- Completely random execution of scheduled playbooks outside of scheduled time.
This patch has been part of my production for 2 weeks now and I haven't observed any issues.
Solution The fix changes cron delay calculation to:
- Use current time as the base:
croniter(job.scheduling_params.cron_expression, now).get_next()- now computes the delay from now to the next occurrence. - Ignore stale
last_exec_time_secfor cron schedules. - Preserve deployment protection: for new jobs (
JobStatus.NEW), still enforceINITIAL_SCHEDULE_DELAY_SEC(120 seconds) to avoid race conditions during deployments when old and new runners might both see the job.
Impact
- Prevents negative delays and incorrect immediate runs.
- Ensures cron jobs run at the correct time relative to the current moment.
- Maintains the initial delay protection for new jobs during deployments.
Walkthrough
Modifies __calc_job_delay_for_next_run in the scheduler to implement explicit cron-based delay calculation for CronScheduleRepeat. For new jobs, applies minimum initial delay; for other statuses, returns computed next_delay directly. Removes reliance on last_exec_time_sec for cron scheduling.
Changes
| Cohort / File(s) | Change Summary |
|---|---|
Cron delay calculation logic src/robusta/core/schedule/scheduler.py |
Refactors __calc_job_delay_for_next_run to add explicit cron-based delay path using croniter; applies minimum INITIAL_SCHEDULE_DELAY_SEC for CronScheduleRepeat with NEW state; returns direct computed delay for non-NEW states; preserves existing behavior for non-cron types; removes last_exec_time_sec dependency. |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
- Requires verification that all scheduling type branches (
CronScheduleRepeat,DynamicDelayRepeat, etc.) maintain correct behavior - Edge case validation needed for NEW vs. non-NEW job states to ensure timing assumptions hold
- Impact on initial deployment timing and potential race condition protections should be verified
Possibly related PRs
- robusta-dev/robusta#1944: Modifies the same
__calc_job_delay_for_next_runmethod and changes initial delay handling forCronScheduleRepeatwith new jobs.
Suggested reviewers
- Sheeproid
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title directly addresses the main issue fixed: random scheduling of playbooks caused by stale timestamp calculations in cron-based scheduling. |
| Description check | ✅ Passed | The description clearly explains the problem with stale last_exec_time_sec, the specific issues it caused, the solution implemented, and the positive impact, all directly related to the changeset. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.