Cavalcade-Runner icon indicating copy to clipboard operation
Cavalcade-Runner copied to clipboard

Improve logic for setting next run time in `Job::reschedule()`.

Open peterwilsoncc opened this issue 6 years ago • 7 comments

Job::reschedule() uses the value nextrun + interval when calculating the new time to run the job.

https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-job.php#L84

This causes the jobs to run constantly if the task is originally scheduled before the current timestamp until nextrun + interval is after the current time stamp.

To reproduce schedule an event with wp_schedule_event( 0, 'hourly', 'pwcc_every_hour' ) as a shorthand way of setting the task to start immediately and then once every hour. The outcome is:

  • the action pwcc_every_hour fires 426,388 consecutively
  • the action pwcc_every_hour continues firing once each hour

A better approach would be to record the start time when the lock is obtained and add the interval to that when rescheduling.

peterwilsoncc avatar Aug 23 '18 04:08 peterwilsoncc

I have the same problem. I've stopped my cavalcade runner for a while (some days) to do maintenance. Now I have lots of cron jobs running everytime, they are scheduled hourly.

To fix the problem, i've changed this line with $this->nextrun = date( MYSQL_DATE_FORMAT, time() + $this->interval );

There is this PR opened about this problem too

edubacco avatar May 06 '19 19:05 edubacco

The patch has been here for 1.5 years. Is anyone going to merge it?

archon810 avatar Aug 18 '20 20:08 archon810

Fixed in #64 and tag 1.0.1

roborourke avatar Sep 18 '20 16:09 roborourke

@roborourke How forcing the nextrun calculation to use UTC solves this issue? Correct me if I'm wrong, but I believe the case created by @peterwilsoncc will still be an issue with forced UTC.

maciejmackowiak avatar Sep 21 '20 12:09 maciejmackowiak

@maciejmackowiak quite right, I misread the issue as it was relating to the same section of code #64 dealt with.

roborourke avatar Sep 21 '20 14:09 roborourke

We're running into a similar situation on a multisite install.

Our install has a lot of cron jobs scheduled (mostly because of plugins needing syncing functionality). Meaning the number of workers cannot adequately handle the number of jobs. This causes a backlog to occur and this backlog will keep increasing until we purge these older waiting jobs.

We could increase the number of workers used, but that's somewhat of a workaround. It would be nice if a hook was added to the rescheduling portion of the runner so we could override the nextrun + interval behavior if needed.

Perhaps after this block:

https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-job.php#L86-L91

We could add the following:

$this->nextrun = Runner::instance()->hooks->run( 'Job.reschedule.nextrun', $this->nextrun, $this );

r-a-y avatar Apr 02 '21 21:04 r-a-y

I'd like to chime in and connect this issue to the thread at https://github.com/humanmade/Cavalcade/issues/74#issuecomment-453149568. Ryan states in his follow-up that this rescheduling behavior is intentional because it provides resiliency in case the Runner crashes and then later restarts, and adds:

So long as everything is in UTC, there should be no issue with this, as the clock (generally) monotonically increases.

But the problem described in this ticket is not with monotonicity. The scheduled events in the backlog being described here have timestamps that are correctly ordered. The problem is that the timestamps are all in the past, so that the nextrun + interval calculation never catches up to the current timestamp.

Changing to time() + interval would make the problem go away, but I understand there may be concern that certain tasks would be run less frequently than the scheduling "client" (whatever code is calling wp_schedule_event()) expects it to be. In my experience, this change would not matter for the vast majority of WP scheduled tasks. For example, if a plugin checks an RSS feed every hour, and it misses three checks due to a lack of workers, it is not important that those three checks be "made up".

That being said, a more conservative option might be for the runner to refuse to reschedule a task if an identical task (same hook + site + args) is already scheduled with a past timestamp (ie, it's due and waiting for available runners).

boonebgorges avatar Apr 09 '21 14:04 boonebgorges