timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

Job scheduling should be able to schedule exact time of execution, not delay between jobs

Open EcBen opened this issue 3 years ago • 13 comments

Relevant system information:

  • OS: Ubuntu 20.04 (LTS) x64
  • PostgreSQL version: PostgreSQL 12.6 (Ubuntu 12.6-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
  • TimescaleDB version: 2.0.0
  • Installation method: apt install

Describe the bug I have a User-Defined Action that I want to run nightly at 1AM. It takes 4 hours to run. Each night, the next_start drifts later by the 4 hour run time of the job/action.

To Reproduce Steps to reproduce the behavior:

  1. Add the job:
SELECT add_job(
  'myjob', 
  '1d', 
  '{}'::JSONB, 
  date_trunc('day', NOW()) + INTERVAL '1 day' + INTERVAL '1 hour' -- Start tomorrow at 1AM
); 
  1. Wait a day
  2. See next_start time:
SELECT * FROM timescaledb_information.jobs;

Expected behavior When I use 1d for the schedule_interval, I expect the job to run at the same time each day.

Actual behavior The job drifts forward every day by the amount of time the job takes to run.

Screenshots Here's a screenshot of the CPU as my 4 hour long job runs. Notice each day, it drifts forward by 4 hours: job_start_drift

Here's a screenshot of the job's configuration: job_config

EcBen avatar Feb 19 '21 16:02 EcBen

Hi @EmergentCybernetics This is a known aspect of the scheduler, that the interval is the amount of time between executions, and doesn't guarantee that the program runs at exactly the same time each week.

Understand that both behaviors are desirable; will log as an enhancement request. Thanks!

mfreed avatar Feb 20 '21 13:02 mfreed

@mfreed Thanks for the quick response. I expected this was because of Timescale using between. I love being able to schedule jobs in the database, but for this to be useful, I need the jobs to run at the same time each day.

I'm planning on adding another job that runs every 8 hours and edits the next_start value for the other job to always be 1AM. Do you see any downside to that approach?

EcBen avatar Feb 22 '21 23:02 EcBen

Hi @EmergentCybernetics Yeah, I've heard of others do that as well, and don't see a reason it shouldn't work. Appreciate too this as a tracking issue for the request; you aren't the first to ask for it.

mfreed avatar Feb 23 '21 21:02 mfreed

+1 for

  • keeping the current time between interval behavior, I need it for some use cases
  • adding a new scheduled at behavior, like cron basically.

problem with the second behavior is with overlapping jobs. For example, a job scheduled at every minute, but the previous job takes longer than a minute, what do you do?

jflambert avatar Mar 01 '21 16:03 jflambert

+1 for this feature - we need a clock aligned scheduler

alysidi avatar May 14 '21 13:05 alysidi

problem with the second behavior is with overlapping jobs. For example, a job scheduled at every minute, but the previous job takes longer than a minute, what do you do?

This actually becomes a semantically-complicated question (because if you consistently would miss, you'd never "catch up").

Perhaps the initial implementation of this could simplify the problem: Next job is scheduled at next clock_aligned time. So if you "miss" the next scheduled time because the last job took too long -> too bad, you just wait for the next scheduled interval.

That is, if a task is meant to run every minute (say, 12:00:00, 12:01:00, and one execution starting at 12:00:00 takes 65 seconds, the next execution is at 12:02:00).

This isn't a problem if your scheduled times are far apart (hour, day) compared to execution time, but makes it less appropriate for a job to be run every second. But perhaps that's acceptable as well, as for something you want to run secondly, can revert to the current delay-based implementation.

Thoughts from requestors?

mfreed avatar Aug 19 '21 03:08 mfreed

I think most people would expect it to work like cron, where jobs would overlap if they take too long to run.

drpebcak avatar Aug 19 '21 03:08 drpebcak

But UNIX cron probably doesn't have the same problem like max_background_workers as the database, and perhaps even worse, doesn't typically deal with the complexities of lock discipline.

mfreed avatar Aug 19 '21 03:08 mfreed

For what it's worth, pg_cron doesn't support overlapping jobs either.

https://github.com/citusdata/pg_cron#what-is-pg_cron

pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes.

Which means that this can drift as well, but only if the scheduling interview < or ~ execution time.

So it wouldn't drift if your scheduling interview >> execution time. But in that case, my even simpler proposal ("wait for next scheduled interval") would work the same as well. 🤷

mfreed avatar Aug 19 '21 03:08 mfreed

True, it probably is safer all around not to allow a job to overlap. Both because of the way background workers are and also because of the way sql works - in most cases a duplicate job would probably cause some locking nightmares.

For me the biggest benefit would be that if you wanted to run something once a day or once a month, it would reliably start running at the exact same time every time.

drpebcak avatar Aug 19 '21 03:08 drpebcak

I implemented an - at least for me - working workaround. It is possible to change the "next_start"-attribute of a job from inside the job itself. In the procedure which is called by the job (PSQL-Pseudocode) (most probably at the "end" of the procedure)

  1. get the "last_run_started" entry of the job (in the procedure/function
-- Create next schedule (_job_id is the id of the job which is provided as parameter to the job procedure)
select last_run_started_at into schedule_start from timescaledb_information.job_stats where timescaledb_information.job_stats.job_id = _job_id ;
  1. Change the next_start - entry of the job:
PERFORM alter_job(job_id, next_start => schedule_start + INTERVAL '1 day');

Note: Make sure to check, if the "last_run_started_at" entry is null. If so, have a default (eg. current_timestamp).

In general this does not account for overlapping time periods and edge-cases. Side effects are also not considered - however, for simple use-cases like mine, it's working as expected.

andnig avatar Sep 09 '21 15:09 andnig

I implemented an - at least for me - working workaround. It is possible to change the "next_start"-attribute of a job from inside the job itself.

in my case I needed to run a job every day at midnight so I simplified your code to

PERFORM alter_job(job_id, next_start => DATE_TRUNC('day', NOW()) + '1 DAY'::INTERVAL);

Thanks a lot!

jflambert avatar May 18 '22 15:05 jflambert

+1

Dashing-Nelson avatar Jul 12 '22 15:07 Dashing-Nelson

+1

ricardo-s-ferreira-alb avatar Sep 26 '22 11:09 ricardo-s-ferreira-alb

This is fixed in #4664

mkindahl avatar Nov 02 '22 12:11 mkindahl

Wonderful news

jflambert avatar Nov 02 '22 13:11 jflambert

Thank you!

Dashing-Nelson avatar Nov 02 '22 14:11 Dashing-Nelson