db-scheduler icon indicating copy to clipboard operation
db-scheduler copied to clipboard

Recurring task starting at fixed delays

Open silmeth opened this issue 8 years ago • 13 comments

One can, at the moment, create a RecurringTask that gets rescheduled with FixedDelay, but it is indeed a fixed delay between a previous completion of the task and new scheduled completion.

That means that if a task is a recurring task with FixedDelay of 5 seconds, and the task is scheduled for 14:00:00.00, actually starts at 14:00:00.05 and takes 0.3 s to execute, the next execution will get scheduled for execution after 14:00:05.08.

Is there a way to achieve recurring task that will get scheduled for 14:00:00.00, then 14:00:05.00, then 14:00:10.00 etc., without a time drift caused by execution time and a delay between planned execution time and actual picking of the task?

silmeth avatar Apr 20 '17 12:04 silmeth

I see two ways to achieve this.

  • Create an array of fixed LocalTimes with a given delay (00:00:00, 00:00:05, 00:00:10, etc.) and use RecurringTask with a Daily schedule with those times. This does not allow delays that aren’t divisors of 24 hours.
  • Implement new CompletionHandler that reschedules an execution basing on its original execution time (eg. like this) and then create a FixedRateTask that uses such handler. This works, if some executions were skipped (eg. because the system was down), it gets rescheduled in the past and as an effect is executed as many times as it would be if working properly all the time.

silmeth avatar Apr 20 '17 14:04 silmeth

Good suggestion! I have not added such functionality because I personally have never had use for the fixed-rate variant. And it also raises questions as to what to do when executions take longer than the rate they are scheduled :).

May I ask what the use-case is? It would help when discussing the solution.

Regarding your solutions, I think the second would be a good idea if that solves your use-case. You get to isolate the behavior in a CompletionHandler, which is nice.

kagkarlsson avatar Apr 20 '17 18:04 kagkarlsson

The use-case, simplified, is to process data being streamed to a data warehouse. The data is to be processed in given time frames, every n seconds, with a given delay (eg. another 10 s) to make sure that all needed data has been committed to the storage. What’s important is that all time frames need to be processed (even if later than planned) and they need to be processed in order.

I was intending to couple the time frame of data to be processed with the planned execution time of a task. If a task is scheduled for scheduled_time then that task will process data with timestamp where scheduled_time - 20 s ≤ timestamp < scheduled_time - 10 s. And so I needed timestamps to differ between executions by the exact same delay with regard to single milliseconds.

But now I see that I do not have access to the scheduled_time/execution time in the task itself, so I need to keep the information about the time frame somewhere – probably in a different table of the database, so the requirement for tasks to be exactly scheduled at fixed rate is not so strong – I’d still rather like them to be this way to eliminate accumulation of delay in longer time, though.

silmeth avatar Apr 21 '17 08:04 silmeth

Ok, thanks!

I think it would be a good idea to let the task have access to the execution time. We could maybe add it as a field in the ExecutionContext class

Am hoping to do a new release of the project soon, will see if this addition can be made. Or if you would like to PR it, feel free :)

kagkarlsson avatar Apr 21 '17 18:04 kagkarlsson

I have exposed the Execution in the ExecutionContext in master now. See commit c3528353dd12b5e8c2be8d5d7d1b229b70eb4a30

kagkarlsson avatar Apr 24 '17 07:04 kagkarlsson

Do you think it’s a good idea to add the FixedRateTask to the library too? That’d need the OnCompleteRescheduleByByPreviousExecution completion handler, which in turn definitely needs a better, shorter, name ;-).

Regarding the “what to do when executions take longer than the rate they are scheduled” question – I am not sure what would be the best solution for general public. But still, given the scheduled executions get actually run sometime after they are scheduled (basing on the polling interval and db responsiveness), I think that running always only one task and rescheduling it in the past when it ends is somehow reasonable.

Eg. the task is scheduled at 10:00:00, the delay set is 10 s, and it is picked and executed at 10:00:04 and takes 7 s to finish. It is finished at 10:00:11 and only then it gets rescheduled in the past at 10:00:10 and immediately gets picked and executed again. IMO it’d be an OK default behaviour, but would need to be well documented to not confuse anybody. :)

To avoid such behaviour (which is desirable in my case, but surely may not be in others’) one can check the system time when picking the task and set a timeout that is short relative to the delay, to make sure that the task ends before it should be rescheduled.

To make the solution more elegant, the task would need to carry a timeout with its data. This would need a DB schema change, or just putting it in the state as described in #5.

silmeth avatar Apr 24 '17 08:04 silmeth

Yes, I think including a FixedRateTask / GuaranteedRateTask / FixedRateRecurringTask (trying to decide what an appropriate name is..) in the lib would be a good idea :). We just need to make the difference clear in the documentation. I think it will be a sort of special case though, for when you want more exact control over the execution-rate and times.

This task may be extended to take some constructor-parameter that defines what behavior you want when the calculated next execution time is in the past. SKIP,ACCEPT,ACCEPT_AND_WARN.. or something like that. SKIP could skip the calculated time if in the past, and calculate again based on the executionComplete time.

Not sure I follow what you mean with the timeout. Are you suggesting that the scheduler try to cancel a running execution if it is taking longer than the timeout? I think that would be a bit complex..

kagkarlsson avatar Apr 24 '17 19:04 kagkarlsson

This task may be extended to take some constructor-parameter that defines what behavior you want when the calculated next execution time is in the past. SKIP,ACCEPT,ACCEPT_AND_WARN.

That’s a very good idea. I didn’t think about it.

Not sure I follow what you mean with the timeout. Are you suggesting that the scheduler try to cancel a running execution if it is taking longer than the timeout? I think that would be a bit complex..

Well, I was thinking what a user of the library might do on his end – he could wrap the whole execution in a CompletableFuture and time it out before the next planned execution. Of course, doing the same on the library’s side would be much more complex and require quite a bit redesigning of the whole architecture… But, don’t worry, I wasn’t suggesting doing it. :)

silmeth avatar Apr 25 '17 12:04 silmeth

How did this work out for you @silmeth ?

kagkarlsson avatar May 04 '17 09:05 kagkarlsson

Sorry, I have been on vacation for some time, and then busy with other things. We ended up using custom FixedRateTask and customly extended scheduler and taskRepo (for picking tasks in batches to minimize db queries). So far it seems to be working pretty well.

Perhaps I will send a simple PR with the FixedRateTask this or the next week (can’t promise atm, though, as it requires a bit of cleanup on my side and adding a few things that we did not need).

silmeth avatar May 16 '17 14:05 silmeth

Ok. Yeah, db-scheduler is not yet optimized for high volumes. What volumes are you working with that made you have to patch it? Also, not sure if picking tasks in batches sounds completely safe, but if it works for you... :)

kagkarlsson avatar May 16 '17 18:05 kagkarlsson

FixedRate schedules should be a bit easier now when Schedule have access to the full ExecutionComplete type

kagkarlsson avatar Apr 06 '18 14:04 kagkarlsson

I know this thread is old, but we recently fixed a bug which limited the throughput of the scheduler if that is of interest.

kagkarlsson avatar Apr 25 '19 07:04 kagkarlsson