apscheduler APScheduler is executing four times the same task while being deployed under Django/UWSGI

Hey,

I integrated APScheduler into my app for newsletter - basically site owner can create post that is shown on certain time for e.g. in next week, and because of that I create also task for email notification for my clients.

Everything was working correctly under Django/UWSGI deployed app for months. But now, I get task executed even four times:

Adding job tentatively -- it will be properly scheduled when the scheduler starts
Added job "patch_mysql_connection" to job store "default"
Scheduler started
Looking for jobs to run
Next wakeup is due at 2022-07-29 09:00:00+02:00 (in 151498.751408 seconds)
Adding job tentatively -- it will be properly scheduled when the scheduler starts
Added job "patch_mysql_connection" to job store "default"
Scheduler started
Looking for jobs to run
Next wakeup is due at 2022-07-29 09:00:00+02:00 (in 151497.970212 seconds)
Adding job tentatively -- it will be properly scheduled when the scheduler starts
Added job "patch_mysql_connection" to job store "default"
Scheduler started
Looking for jobs to run
Next wakeup is due at 2022-07-29 09:00:00+02:00 (in 151497.890432 seconds)
Adding job tentatively -- it will be properly scheduled when the scheduler starts
Added job "patch_mysql_connection" to job store "default"
Scheduler started
Looking for jobs to run
Next wakeup is due at 2022-07-29 09:00:00+02:00 (in 151497.502388 seconds)
Looking for jobs to run
Looking for jobs to run
Looking for jobs to run
Looking for jobs to run
Running job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" (scheduled at 2022-07-29 07:00:00+00:00)
Running job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" (scheduled at 2022-07-29 07:00:00+00:00)
Removed job newsletter_job_563
Removed job newsletter_job_563
Running job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" (scheduled at 2022-07-29 07:00:00+00:00)
Running job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" (scheduled at 2022-07-29 07:00:00+00:00)
No jobs; waiting until a job is added
No jobs; waiting until a job is added
Removed job newsletter_job_563
No jobs; waiting until a job is added
Removed job newsletter_job_563
No jobs; waiting until a job is added
Job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" executed successfully
Job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" executed successfully
Job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" executed successfully
Job "patch_mysql_connection (trigger: date[2022-07-29 07:00:00 UTC], next run at: 2022-07-29 07:00:00 UTC)" executed successfully

I'm not sure why it works this way. I'm using BackgroundScheduler from documentation, and it's ran when application loads from config:

scheduler = BackgroundScheduler()

def run():
    """Run scheduler."""
    logging.basicConfig()
    logging.getLogger('apscheduler').setLevel(logging.DEBUG)
    newsletter_pages = Page.objects.filter(
        add_to_newsletter=True,
        message__isnull=False,
        date_published__gt=timezone.now(),
    )
    for page in newsletter_pages:
        for submission in page.message.submission_set.filter(sent=False):
            add_newsletter_job(submission, page.date_published)
    scheduler.start()

    def ready(self):
        from . import scheduler
        import example_app.signals
        scheduler.run()

I though there can be multiple instances of submission, but the iterating of submission_set is correct. I also don't get why the task is ran multiple times, even when I give certain ID for it:

def patch_mysql_connection(func):
    """Patch mysql connection before executing function for resolving db-away problem."""
    connection.close()
    connection.connect()
    func()


def add_newsletter_job(instance, publish_date):  # noqa: D103
    scheduler.add_job(
        patch_mysql_connection,
        args=[instance.submit],
        trigger='date',
        run_date=publish_date,
        id=f'newsletter_job_{instance.id}',
        replace_existing=True,
    )

Job is based on instance of Page (post added by site owner) from Django - the same newsletter about post is sent four times, the ID can be only one.

The task is called on post_save signal created this way:

@receiver(post_save, sender=Page)
def page_created(sender, instance, created, **kwargs):
    if instance.add_to_newsletter and instance.message:
        for submission in instance.message.submission_set.filter(sent=False):
            if scheduler.get_job(f'newsletter_job_{submission.id}') and instance.date_published > timezone.now():
                reschedule_newsletter_job(submission, instance.date_published)
            elif instance.date_published > timezone.now():
                add_newsletter_job(submission, instance.date_published)
            else:
                add_newsletter_job(submission, timezone.now()+timedelta(minutes=1))

I even check if job already exists - so, I reschedule it when I need to. Any suggestions?

Aug 07 '22 19:08 memegauste

Sorry but I don't see a way for me to reproduce your situation. Are you absolutely certain that you aren't running multiple uWSGI workers? This part of the logs would indicate 4 workers:

Looking for jobs to run
Looking for jobs to run
Looking for jobs to run
Looking for jobs to run

Aug 07 '22 23:08 agronholm

Okey, so how I would do know that? The server is configured by our administrator, so I would assume he set it by default way.

Is this enough to limit workers to just one?:

# Configure the project to have just single worker:
executors = {
    'default': ThreadPoolExecutor(1),   # max threads: 1
    'processpool': ProcessPoolExecutor(1)  # max processes: 1
}
scheduler = BackgroundScheduler(executors=executors)

(I don't need to run multiple workers among multiple threads, single thread is for me enough since the task I schedule is pretty simple and not very-resource bound).

Aug 08 '22 05:08 memegauste

I wasn't referring to worker threads, but uWSGI worker processes. See the relevant FAQ entry for background. APScheduler 4.0 will remove this restriction but it's only alpha quality currently (4.0.0a1 coming up soon) so it's not yet ready for use in production.

Aug 08 '22 07:08 agronholm

Ok, thanks - so I will probably move temporary into Celery, since I'm familiar with it and I am not really keen on using gRPC or RyPC. I wish the best for your 4.0 release anyway!

Aug 08 '22 19:08 memegauste

apscheduler apscheduler copied to clipboard

APScheduler is executing four times the same task while being deployed under Django/UWSGI

apscheduler
apscheduler copied to clipboard