apscheduler
apscheduler copied to clipboard
Feature request: Passing the scheduled start time to jobs
Currently there are no way for the job to know what is the scheduled start time (only the current time by doing datetime.now()
). This feature would be useful for time sensitive jobs (e.g. extra handling if the job is delayed)
http://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html#module-apscheduler.triggers.cron Isn't "start_date" what you are looking for?
@DeeVeX Nope. From what I understand from the docs, start_date
is for setting the CronTrigger
's first trigger time. Whereas I want to know for each job, what is the scheduled trigger time, and passing that into the job
Oh that would be nice. I could use the same thing for a jobs last run to do cleanups!
Does anyone working on this?
Yes, this feature is useful to have for time sensitive jobs. I also need this in my current project. Is there any work going on this? I am also open to submit PR for this if I get some light on which direction to move on.
Till then a workaround that I can think of is:
- While adding job(
add_job()
) to scheduler pass an explicit identifier and store it somewhere. - When the job runs, fetch job(
get_job()
) from scheduler making use of the stored id and then do: job.next_run_time - trigger's value Something of that sort should work.
But I am not sure if it is full proof. And also, what would happen in the scenarios when using DB store and the scheduler goes down and jobs are missed and then the scheduler comes up. Would the calculation be still correct in that case
Either way waiting for this feature...
Also I guess calling it scheduled_time
/nominal_time
instead of scheduled_start_time
would be less confusing.
@AdrianTeng @DeeVeX
Mine use case by the way is explained below:
Suppose a job is scheduled to poll data from a server every 10 minutes.
Now the request that is made to server might have a start_time
and end_time
to be mentioned in the request being made. Something like: fetch_messages_from_smtp_server(start_time, end_time)
or an equivalent REST API maybe.
So, I will add_job
to scheduler mentioning the start time say 7th Sept 2018 12:00:00 UTC
and an interval of 10 minutes
.
And inside the job definition currently I would do something on these lines (psuedocode):
start_time = datetime.utcnow()
end_time = start_time + 10 mins
fetch_messages_from_smtp_server(start_time, end_time)
Now the catch here is, in an ideal world, I would expect the value of start_time
to be as follows in the subsequent runs:
7th Sept 2018 12:00:00 UTC
7th Sept 2018 12:10:00 UTC
7th Sept 2018 12:20:00 UTC
.
.
.
But chances are that these values might be delayed because of several reasons, in which case I might get undesired results from the server.
For example if the actual first run time is 7th Sept 2018 12:00:30 UTC
instead of planned/scheduled 7th Sept 2018 12:00:00 UTC
, my request would be asking for all the messages between 7th Sept 2018 12:00:30 UTC
to 7th Sept 2018 12:10:30 UTC
in which case I would miss the messages for the first 30 secs and would possibly get some extra messages because of last 30 secs. And then if the next run runs on the exact scheduled time, I would have duplicate messages for the first 30 secs as they were already retrieved in the last run(issue of last run not this run). etc.
If there is a way where I could access the actual scheduled_time
/nominal_time
instead of calling datetime.utcnow
for each run, this would not be a problem.
I need this feature too for measuring latency. Seems like it would be very easy to implement.
Just add { "scheduled_run_time": run_time } or something like that to **kwargs in line 125 of the BaseExecutor class - "retval = job.func(*job.args, **job.kwargs)".
Anyway, not sure the best way to do it, but since the value is right there, including it in the job.func call should be easy enough.
I think this is a widely required feature in such cron like scheduler, but is's quite strange that neither apscheduler
nor quartz
provide this.
This will be implemented in v4.0 in such a manner that the target function can receive all sorts of information about the job.
Any update on this?
@agronholm - has this been implemented and if not do you still plan to implement it and when?
Preliminary work has been done but I'm mostly focusing on the highlight feature (data store sharing) now. It will take some time before 4.0 is in any usable state.
I hope add a new event(JOB_ENENT_START) for this
Are there any news on this? Do you need contributors since the function doesn't seem to be moving forward?
There scheduled start time will not be passed to the scheduled function directly, but will be available through a context variable. I'm not sure if this will make into 4.0.0a1 but it should make into the first beta.
@agronholm Thanks for the quick response. Is there an ETA on v4? I've just started using this library.
I've stopped giving out ETAs as they have passed me by one by one. It's best to just follow #465 for progress updates. The first alpha only requires a couple more pushes, as soon as I can muster the willpower. I have quite a few other projects to maintain too, and those take their own share of my free time.
Alright, thanks for the update. I'm guessing that the onboarding process might be so difficult at the moment that it's not worth asking for contributions from your side.
If you really need this feature, you need to fork the project and modify your executor of choice to add this information to context variable (or threadlocal). That way the eventual transition to v4.0 should be relatively painless. I'm unfortunately not accepting contributions on 4.0 code until the code base is stable enough.
I'll just write a function that parses the DB and inserts missing data for now. Looking forward to the release though.
Implemented in v4.0.0a1 via the contextvar apscheduler.current_job
.