redbeat icon indicating copy to clipboard operation
redbeat copied to clipboard

rrule: first run delayed

Open yoch opened this issue 6 years ago • 15 comments

@concreted : When I use recurrent rule, the first action isn't taken at time.

Example (as you can see, the first run is delayed from 21:20:03 to 21:19:58 without a clear reason):

test.py

from celery import Celery

app = Celery()

@app.task
def test():
    print('test')

run.py

from redbeat.schedules import rrule
from redbeat import RedBeatSchedulerEntry
from test import test, app

if __name__ == '__main__':
    schedule = rrule('MINUTELY', interval=3, count=4)
    print(schedule) # <rrule: freq: 5, dtstart: 2017-12-09 20:17:03.882+00:00, interval: 3, count: 4, ...>
    entry = RedBeatSchedulerEntry('test', test.name, schedule, app=app)
    entry.save()

out.log

[2017-12-09 21:19:58,320: INFO/MainProcess] Received task: test.test[7298979f-333e-4aae-b83d-f09cd7e2e7c0]
[2017-12-09 21:20:03,028: INFO/MainProcess] Received task: test.test[832d9876-8171-4c58-9835-ad5013e4b95d]
[2017-12-09 21:23:03,016: INFO/MainProcess] Received task: test.test[b7de89c3-c010-47de-a8c2-49350ff58050]
[2017-12-09 21:26:03,018: INFO/MainProcess] Received task: test.test[1050295c-0ea8-4824-89b2-ac71efa2c1ba]

yoch avatar Dec 09 '17 21:12 yoch

Another (seems related) problem, is that if I set the whole rrule before current time, it's not scheduled at all.

dt = datetime.now() - timedelta(minutes=15)
schedule = rrule('MINUTELY', dtstart=dt, interval=3, count=4)
entry = RedBeatSchedulerEntry('test', test.name, schedule, app=app)
print(entry.is_due())    # schedstate(is_due=False, next=2699.667361)

yoch avatar Dec 10 '17 07:12 yoch

@concreted, are you able to investigate?

sibson avatar Dec 10 '17 16:12 sibson

Planning to investigate this week, thanks for the repro steps.

concreted avatar Dec 10 '17 17:12 concreted

This looks related to the fact that Celery sets ScheduleEntry.last_run_at to utcnow() by default. When I forced the rrule entry's last_run_at to be None, it behaved as expected (runs the first occurrence immediately, next occurrences run according to schedule):

run.py

schedule = rrule('MINUTELY', interval=1, count=2)
print(schedule) # <rrule: freq: 5, dtstart: 2017-12-10 19:31:22.637379+00:00, interval: 
entry = RedBeatSchedulerEntry('test', test.name, schedule, app=app())
entry.last_run_at = None
entry.save()

out.log

[2017-12-10 19:31:25,163: WARNING/ForkPoolWorker-1] test
[2017-12-10 19:32:22,011: WARNING/ForkPoolWorker-2] test

@yoch As a workaround can you try setting entry.last_run_at = None on your rrule entry before saving, and see if it works correctly for you?

@sibson What do you think is the right approach here? I tried setting last_run_at in RedbeatSchedulerEntry.__init__ to either None or epoch, but that broke a lot of tests. I'm worried there are things that rely on Celery's default behavior.

concreted avatar Dec 10 '17 20:12 concreted

@concreted: Thank you !

Your workaround works for me in the first case, but not for the second one (only the first run was fired).

yoch avatar Dec 10 '17 20:12 yoch

@yoch Thanks for confirming. I will be looking into the second case - want to confirm, what is your expected behavior for your example? Would you expect that all four occurrences fire immediately, because they should have all occurred in the past?

concreted avatar Dec 10 '17 20:12 concreted

@concreted : Yes, I think all these occurrences should fire almost immediately, or at least one at each beat.

In fact, the more realistic case is when for some reason (e.g. a congestion), one rrule event fires after that the next one is already due, which may cause to skip some events.

yoch avatar Dec 10 '17 21:12 yoch

My feeling is celery's behaviour of defaulting last_run_at = now(), is wrong and I've had to work around it. I don't fully understand the ramifications of changing the constructor to default to epoch. None, won't work as celery will override it later. I've lost track of the various things I tried and why they failed to work. So maybe a fresh set of eyes will be able to solve it. RRule is new enough I'd be open to making an isolated breaking change there if needed.

sibson avatar Dec 11 '17 17:12 sibson

I'm looking to get a 2.0.0 release out. Is there a change to rrule that can be made before then to resolve this issue?

sibson avatar Aug 20 '19 02:08 sibson

@sibson When are you hoping to get a release out? I may be able to take a look this week. Will confirm tomorrow.

concreted avatar Aug 20 '19 02:08 concreted

I don't have a date in mind, I can wait for this to be resolved if you are planning to work on it.

sibson avatar Aug 20 '19 11:08 sibson

I will spend some time tomorrow to investigate this and let you know if I'll be able to fix. Thanks!

concreted avatar Aug 21 '19 03:08 concreted

I looked into this yesterday. There are two separate issues here:

  1. rrules without dtstart set don't fire their first event. Expected: first event fires.
  2. rrules with dtstart in the past don't fire any events. Expected: all past events fire quickly immediately after entry created.

Currently I'm using last_run_at as a pointer in time so that rrule can return the next occurrence after that.

I believe issue with 1) is that Celery's last_run_at is wrongly set to now() by default by Celery, and this is getting saved as the meta: last_run_at. If you don't set a dtstart on the rrule, I have it default to now(). Because generally schedules are created first, then RedBeatSchedulerEntrys, this means the entry's last_run_at is always after the first occurrence unless you explicitly set dtstart to some time in the future.

The issue with 2) is somewhat related, in that if last_run_at is after all rrule occurrences were supposed to occur, ask for the next occurrence will return nothing. This is expected and correct behavior for the underlying rrule object, but isn't the expected behavior for task scheduling.

What I'm thinking is that rrule functions best when the last_run_at time is exactly the time of the last rrule occurrence that ran - by that I mean the rrule occurrence's actual scheduled time to run, not the time that Celery actually ran the task. This would solve both problems, since it would always know exactly when to calculate the next occurrence from, instead of using the execution time.

I have a branch here (https://github.com/sibson/redbeat/compare/master...concreted:rrule-first-run-delayed?expand=1#diff-f064d9e67be22082f4bff1e5ecee4a3cR43) that demonstrates the correct behavior for both cases if is_due() is called directly on an rrule entry (see run.py). It doesn't work with Redbeat, because Redbeat would have to store and pass the last scheduled task time data back to the rrule.

@sibson What are your thoughts on how to address this? I thought of using some new metadata e.g. last_scheduled_task_at, and passing that instead of last_run_at to the entry as a special case for rrule. Another way could be to update the schedule definition object itself with this info (see usage of self.last_run_at in redbeat/schedules.py). But dynamically updating the definition also doesn't seem like a great idea.

Setting last_run_at to None or epoch as a special case for rrule would solve 1) but not 2). For example, if you had an rrule set to run for 4 occurrences every second on Jan 1 1990, if we set last_run_at to epoch, only the first event would run because last_run_at would immediately be set to after the 4 expected occurrences.

concreted avatar Aug 23 '19 21:08 concreted

I've run into similar issues with celery's defaulting last_run_at to now() behaviour. I don't think I found a satisfying solution then either. I do wonder if we need to solve this problem more generally to make the rrule solution simpler to implement.

I'd like to find away to localize the rrule special behaviours to rrule related classes without unduly impacting other parts of the code. That makes it easier to reason about in future. I don't know how that influences which is the better option of the two you proposed. Unless, I'm missing a use case I'm inclined to worry less about possible edge cases in 1990, than getting 2019 onwards working as expected.

sibson avatar Aug 27 '19 03:08 sibson

Hello,

It seems I am having a similar problem: When I start my task without providing dtstart, task is straight loaded and fired. Everything works fine !

Problem is, when I specify dtstart, task is loaded at desired start date but not fired ! It never starts

Any idea why my task does not get fired when dtstart is provided ?

I have been searching a lot around the issue, and found nothing. I would really appreciate if you could help me on this.

add_task.py:

date_delayed = datetime.datetime(2019, 8, 29, 16, 19, 30) timezone = pytz.timezone('Europe/Paris') date_delayed = timezone.localize(date_delayed) schedule = rrule('SECONDLY', interval=5, count=3, dtstart=date_delayed) entry = Entry('thingo', 'cluster.add_task', schedule, args=(9, 9), app=cluster.app) entry.save()

Could you try a try_later() function in your test to tell me if you are having the same problem ? (just like you did with try_now())

Thanks for your help,

Regards

Simon

zikmout avatar Aug 29 '19 14:08 zikmout