apscheduler icon indicating copy to clipboard operation
apscheduler copied to clipboard

Feature request: Passing the scheduled start time to jobs

Open AdrianTeng opened this issue 6 years ago • 21 comments

Currently there are no way for the job to know what is the scheduled start time (only the current time by doing datetime.now()). This feature would be useful for time sensitive jobs (e.g. extra handling if the job is delayed)

AdrianTeng avatar Mar 20 '18 17:03 AdrianTeng

http://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html#module-apscheduler.triggers.cron Isn't "start_date" what you are looking for?

DeeVeX avatar Mar 23 '18 21:03 DeeVeX

@DeeVeX Nope. From what I understand from the docs, start_date is for setting the CronTrigger's first trigger time. Whereas I want to know for each job, what is the scheduled trigger time, and passing that into the job

AdrianTeng avatar Mar 25 '18 13:03 AdrianTeng

Oh that would be nice. I could use the same thing for a jobs last run to do cleanups!

DeeVeX avatar Mar 25 '18 22:03 DeeVeX

Does anyone working on this?

zhipcui avatar Aug 29 '18 11:08 zhipcui

Yes, this feature is useful to have for time sensitive jobs. I also need this in my current project. Is there any work going on this? I am also open to submit PR for this if I get some light on which direction to move on.

Till then a workaround that I can think of is:

  1. While adding job(add_job()) to scheduler pass an explicit identifier and store it somewhere.
  2. When the job runs, fetch job(get_job()) from scheduler making use of the stored id and then do: job.next_run_time - trigger's value Something of that sort should work.

But I am not sure if it is full proof. And also, what would happen in the scenarios when using DB store and the scheduler goes down and jobs are missed and then the scheduler comes up. Would the calculation be still correct in that case

Either way waiting for this feature...

viiicky avatar Sep 07 '18 06:09 viiicky

Also I guess calling it scheduled_time/nominal_time instead of scheduled_start_time would be less confusing. @AdrianTeng @DeeVeX

viiicky avatar Sep 07 '18 06:09 viiicky

Mine use case by the way is explained below:

Suppose a job is scheduled to poll data from a server every 10 minutes. Now the request that is made to server might have a start_time and end_time to be mentioned in the request being made. Something like: fetch_messages_from_smtp_server(start_time, end_time) or an equivalent REST API maybe.

So, I will add_job to scheduler mentioning the start time say 7th Sept 2018 12:00:00 UTC and an interval of 10 minutes. And inside the job definition currently I would do something on these lines (psuedocode):

start_time = datetime.utcnow()
end_time = start_time + 10 mins
fetch_messages_from_smtp_server(start_time, end_time)

Now the catch here is, in an ideal world, I would expect the value of start_time to be as follows in the subsequent runs: 7th Sept 2018 12:00:00 UTC 7th Sept 2018 12:10:00 UTC 7th Sept 2018 12:20:00 UTC . . . But chances are that these values might be delayed because of several reasons, in which case I might get undesired results from the server.

For example if the actual first run time is 7th Sept 2018 12:00:30 UTC instead of planned/scheduled 7th Sept 2018 12:00:00 UTC, my request would be asking for all the messages between 7th Sept 2018 12:00:30 UTC to 7th Sept 2018 12:10:30 UTC in which case I would miss the messages for the first 30 secs and would possibly get some extra messages because of last 30 secs. And then if the next run runs on the exact scheduled time, I would have duplicate messages for the first 30 secs as they were already retrieved in the last run(issue of last run not this run). etc.

If there is a way where I could access the actual scheduled_time/nominal_time instead of calling datetime.utcnow for each run, this would not be a problem.

viiicky avatar Sep 07 '18 07:09 viiicky

I need this feature too for measuring latency. Seems like it would be very easy to implement.

Just add { "scheduled_run_time": run_time } or something like that to **kwargs in line 125 of the BaseExecutor class - "retval = job.func(*job.args, **job.kwargs)".

Anyway, not sure the best way to do it, but since the value is right there, including it in the job.func call should be easy enough.

richwifunds avatar Mar 24 '19 05:03 richwifunds

I think this is a widely required feature in such cron like scheduler, but is's quite strange that neither apscheduler nor quartz provide this.

vision57 avatar Apr 28 '19 12:04 vision57

This will be implemented in v4.0 in such a manner that the target function can receive all sorts of information about the job.

agronholm avatar Jul 23 '19 09:07 agronholm

Any update on this?

richwifunds avatar Oct 19 '19 20:10 richwifunds

@agronholm - has this been implemented and if not do you still plan to implement it and when?

richwifunds avatar Feb 26 '20 19:02 richwifunds

Preliminary work has been done but I'm mostly focusing on the highlight feature (data store sharing) now. It will take some time before 4.0 is in any usable state.

agronholm avatar Feb 27 '20 05:02 agronholm

I hope add a new event(JOB_ENENT_START) for this

5uw1st avatar Jul 21 '20 14:07 5uw1st

Are there any news on this? Do you need contributors since the function doesn't seem to be moving forward?

Dzeri96 avatar May 16 '22 14:05 Dzeri96

There scheduled start time will not be passed to the scheduled function directly, but will be available through a context variable. I'm not sure if this will make into 4.0.0a1 but it should make into the first beta.

agronholm avatar May 16 '22 14:05 agronholm

@agronholm Thanks for the quick response. Is there an ETA on v4? I've just started using this library.

Dzeri96 avatar May 16 '22 14:05 Dzeri96

I've stopped giving out ETAs as they have passed me by one by one. It's best to just follow #465 for progress updates. The first alpha only requires a couple more pushes, as soon as I can muster the willpower. I have quite a few other projects to maintain too, and those take their own share of my free time.

agronholm avatar May 16 '22 14:05 agronholm

Alright, thanks for the update. I'm guessing that the onboarding process might be so difficult at the moment that it's not worth asking for contributions from your side.

Dzeri96 avatar May 16 '22 14:05 Dzeri96

If you really need this feature, you need to fork the project and modify your executor of choice to add this information to context variable (or threadlocal). That way the eventual transition to v4.0 should be relatively painless. I'm unfortunately not accepting contributions on 4.0 code until the code base is stable enough.

agronholm avatar May 16 '22 15:05 agronholm

I'll just write a function that parses the DB and inserts missing data for now. Looking forward to the release though.

Dzeri96 avatar May 16 '22 15:05 Dzeri96

Implemented in v4.0.0a1 via the contextvar apscheduler.current_job.

agronholm avatar Aug 17 '22 22:08 agronholm