django-celery-beat icon indicating copy to clipboard operation
django-celery-beat copied to clipboard

Celery Beat not dispatching Tasks after upgrade form `2.8.0` to `2.8.1`

Open FabianClemenz opened this issue 7 months ago • 52 comments

I have updated to version 2.8.1 this morning and celery beat stopped dispatching tasks since then. I'm using the database scheduler and i think this is related to the change of using server timezone.

We have timezone UTC on the server but Europe/Berlin in our django application. Also setting timezone to Europe/Berlin on the server stops the tasks from beeing dispatched.

https://github.com/celery/django-celery-beat/pull/886

FabianClemenz avatar May 14 '25 10:05 FabianClemenz

We see something similar on our end.

Basically, ever since our upgrade to 2.8.0 (and now 2.8.1) we have intermittent failures where beat tasks are no longer triggered. The only way to trigger them is to trigger a new build on our servers for some reason (this was never the case before 2.8.0/1).

If I'm not mistaken, the original issue comes from this pr https://github.com/celery/django-celery-beat/pull/879. Can't we simply revert that pr (and the various fixes since that pr) and publish a 2.8.2?

To be clear - having upgraded to 2.8.1 has not fixed the issue.

yanivtoledano avatar May 14 '25 11:05 yanivtoledano

On my side i don't know why it works with 2.8.0 before upgrading and now after downgrading doesn't work...

FabianClemenz avatar May 14 '25 11:05 FabianClemenz

Yes, I have the same issue

vzii avatar May 14 '25 13:05 vzii

we got a new PR https://github.com/celery/django-celery-beat/pull/896, not sure if it is addressing your issue? can you check?

auvipy avatar May 15 '25 05:05 auvipy

@auvipy are there anywhere docs on how to correctly setup and test celery beat locally for contributing?

FabianClemenz avatar May 15 '25 05:05 FabianClemenz

you can check this https://github.com/celery/django-celery-beat?tab=readme-ov-file#developing-django-celery-beat also we use tox for test automation

auvipy avatar May 15 '25 05:05 auvipy

Ok thanks, when I get time I’ll check it. I‘ll try to manually get the wrong times so I can check where the error happens

FYI: On one of our servers I manually reverted the tasks as you mentioned in your docs when changing timezone. This morning the tasks ran, but with utc instead of Europe/Berlin

FabianClemenz avatar May 15 '25 05:05 FabianClemenz

The only way to trigger them is to trigger a new build on our servers for some reason

@yanivtoledano Can you explain this more?

alirafiei75 avatar May 15 '25 05:05 alirafiei75

we got a new PR #896, not sure if it is addressing your issue? can you check?

I do not think that this new PR is related to this issue. Also do not think anyone is using sqlite on production, cause it is a bad practice.

alirafiei75 avatar May 15 '25 05:05 alirafiei75

@alirafiei75 - answers below:

Can you explain this more?

Our Celery Beat tasks run fine throughout the day (cron jobs are triggered as expected). Every night though, the tasks stop triggering anywhere from 2-4 AM in the morning UTC. We know this because we have tasks running every 5 minutes which suddenly stop. This issue did not exist before version 2.8.0 (we never had any issue with Celery Beat skipping/stopping). The only way for tasks to start running again in production is to rebuild (e.g., by pushing a commit to prod).

I do not think that this new PR is related to this issue.

That's my mistake then (we run Postgres in production). Basically, before 2.8.x we never had issues with Celery Beat. Ever since upgrading we have daily issues. We can't downgrade to 2.7.x because we're on Django 5.2.x.

yanivtoledano avatar May 15 '25 09:05 yanivtoledano

FYI, our task scheduling also broke after the latest dependency update. However, it does not seem to be directly related to Celery Beat. Rather, updating @click (Celery dependency) to version 8.2.0 seems to be the cause. Pinning it to 8.1.8 resolved the issue.

alexbehl avatar May 15 '25 11:05 alexbehl

@yanivtoledano i'm having the exact same issue. Interval tasks get picked back up, but all early morning crontab tasks have stopped running. This started after upgrading from 2.7.0 to 2.8.0.

After downgrading back to 2.7.0, the issue was fully resolved.

jp26jp avatar May 15 '25 11:05 jp26jp

FYI, our task scheduling also broke after the latest dependency update. However, it does not seem to be directly related to Celery Beat. Rather, updating @click (Celery dependency) to version 8.2.0 seems to be the cause. Pinning it to 8.1.8 resolved the issue.

We have environment using click 8.1.8 with django celery beat 2.8.0, still have the same issue.

yjchen-tw avatar May 15 '25 13:05 yjchen-tw

@yanivtoledano i'm having the exact same issue. Interval tasks get picked back up, but all early morning crontab tasks have stopped running. This started after upgrading from 2.7.0 to 2.8.0.

After downgrading back to 2.7.0, the issue was fully resolved.

We had downgraded from 2.8.1 to 2.7.0 as well, however that did not resolve it for our interval tasks ( downgrading click helped tho)

Gornoka avatar May 16 '25 08:05 Gornoka

~~Can confirm from our end that pinning click = "8.1.8" solved the issue.~~

yanivtoledano avatar May 16 '25 09:05 yanivtoledano

Can confirm from our end that pinning click = "8.1.8" solved the issue.

I tried to install click to version 8.1.8, but the issue still there

vzii avatar May 16 '25 14:05 vzii

In my case when I set USE_TZ = False tasks won't run but when it's True tasks will run. I don't know if my issue is related to this topic or not

mkanani7 avatar May 17 '25 06:05 mkanani7

In my case when I set USE_TZ = False tasks won't run but when it's True tasks will run. I don't know if my issue is related to this topic or not

What are your TIME_ZONE and CELERY_TIMEZONE settings? do you have any of them?

alirafiei75 avatar May 17 '25 07:05 alirafiei75

In my case when I set USE_TZ = False tasks won't run but when it's True tasks will run. I don't know if my issue is related to this topic or not

What are your TIME_ZONE and CELERY_TIMEZONE settings? do you have any of them?

My TIME_ZONE is "Asia/Tehran" but I don't have CELERY_TIMEZONE. Should I add CELERY_TIMEZONE?

mkanani7 avatar May 17 '25 09:05 mkanani7

In my case when I set USE_TZ = False tasks won't run but when it's True tasks will run. I don't know if my issue is related to this topic or not

What are your TIME_ZONE and CELERY_TIMEZONE settings? do you have any of them?

My TIME_ZONE is "Asia/Tehran" but I don't have CELERY_TIMEZONE. Should I add CELERY_TIMEZONE?

As you did not set CELERY_TIMEZONE and USE_TZ is False, I think the crontabs are saving as default (UTC) and this I think makes the problem. you saved the crontab hour for example 3 (you mean 3 in Tehran time but the celery assumes it is UTC) This is what I think might be happening.

alirafiei75 avatar May 17 '25 09:05 alirafiei75

Can confirm from our end that pinning click = "8.1.8" solved the issue.

Update on this: Bumping click to 8.1.8 only fixed the issue for 1 night. The same issue as before happened during the 2nd night (so click doesn't seem the "sole" issue here)

yanivtoledano avatar May 17 '25 11:05 yanivtoledano

I'm not sure if I'm experiencing the same issue as you guys.

I have a task that runs every second (heartbeat), but it appears to stop executing after an hour. I'm on Python 3.13, Beat 2.8.1, Celery 5.5.2, and Redis 6.1.0. Beat just stops waking up. I've got USE_TZ set to True and TIME_ZONE as UTC.

Does this seem related?

michael-trovato avatar May 17 '25 12:05 michael-trovato

Does this seem related?

Yes, seems related to me. What's your Django version?

yanivtoledano avatar May 17 '25 13:05 yanivtoledano

I'm on Django 5.2.1.

I'm using Redis for the broker and result backend. The scheduler is the DatabaseScheduler, using Postgresql with psycopg 3.2.9.

I'm not sure where to go from here.

michael-trovato avatar May 17 '25 13:05 michael-trovato

By any chance don't you have PeriodicTask.expires set on your tasks? The behavior you are describing is that tasks do run at first but after a certain amount of time they stop running automatically, and this is just the thing expires does in PeriodicTask model.

alirafiei75 avatar May 17 '25 13:05 alirafiei75

I use time delta expiry, but in this case, beat doesn't even initiate the task. It seems that it is getting blocked. I checked the last run time, and it lasted around 57 minutes.

If I stop the container, then it wakes back up, raises this exception, and finishes sending some tasks.

Exception ignored in: <function AsyncResult.del at 0x779c9804dc60> Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/celery/result.py", line 417, in del self.backend.remove_pending_result(self) File "/usr/local/lib/python3.13/site-packages/celery/backends/asynchronous.py", line 208, in remove_pending_result self.on_result_fulfilled(result) File "/usr/local/lib/python3.13/site-packages/celery/backends/asynchronous.py", line 216, in on_result_fulfilled self.result_consumer.cancel_for(result.id) File "/usr/local/lib/python3.13/site-packages/celery/backends/redis.py", line 184, in cancel_for self._pubsub.unsubscribe(key) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 1055, in unsubscribe return self.execute_command("UNSUBSCRIBE", *args) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 876, in execute_command self._execute(connection, connection.send_command, *args, **kwargs) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 914, in _execute return conn.retry.call_with_retry( File "/usr/local/lib/python3.13/site-packages/redis/retry.py", line 90, in call_with_retry fail(error) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 916, in lambda _: self._reconnect(conn), File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 904, in _reconnect conn.connect() File "/usr/local/lib/python3.13/site-packages/redis/connection.py", line 379, in connect self.connect_check_health(check_health=True) File "/usr/local/lib/python3.13/site-packages/redis/connection.py", line 413, in connect_check_health callback(self) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 834, in on_connect self.subscribe(**channels) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 1030, in subscribe ret_val = self.execute_command("SUBSCRIBE", *new_channels.keys()) File "/usr/local/lib/python3.13/site-packages/redis/client.py", line 875, in execute_command with self._lock: File "/usr/local/lib/python3.13/site-packages/celery/apps/beat.py", line 159, in _sync raise SystemExit() SystemExit

Does this provide a clue? Could it be related to the connection to redis?

michael-trovato avatar May 17 '25 14:05 michael-trovato

I use time delta expiry, but in this case, beat doesn't even initiate the task. It seems that it is getting blocked. I checked the last run time, and it lasted around 57 minutes.

If I stop the container, then it wakes back up, raises this exception, and finishes sending some tasks. Does this provide a clue? Could it be related to the connection to redis?

The traceback you are providing does not show anything particularly and I think it is due to shutdown process not being gracefull. But it is indeed related to redis that it seems you are using as result backend. Can you temporarily disable result backend to check if tasks work without hiccups?

alirafiei75 avatar May 17 '25 14:05 alirafiei75

It seems that reverting redis-py to version 5.2.1 has fixed the issue.

Maybe this issue is related?

https://github.com/redis/redis-py/issues/3640

michael-trovato avatar May 17 '25 23:05 michael-trovato

I'm investigating the same problem, I think it's tied to the celery.backend_cleanup task. In my testing with debug enabled, now already twice I keep seeing beat: Waking up in 5.00 seconds. over and over, until I hit the celery.backend_cleanup task. At that point, I don't see the "Waking up" messages anymore and beat stops sending new tasks.

Yesterdays testing:

[...lot's of waking up messages...]
[2025-05-18 03:59:20,838: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:25,840: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:30,842: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:35,843: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:40,845: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:45,847: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:50,848: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-18 03:59:55,850: DEBUG/MainProcess] beat: Waking up in 4.14 seconds.
[2025-05-18 04:00:00,000: INFO/MainProcess] Scheduler: Sending due task celery.backend_cleanup (celery.backend_cleanup)
[2025-05-18 04:00:00,002: DEBUG/MainProcess] celery.backend_cleanup sent. id->cff83f3d-7392-48a1-abff-cc95d78080bb
[2025-05-18 04:00:00,003: INFO/MainProcess] Task celery.backend_cleanup[cff83f3d-7392-48a1-abff-cc95d78080bb] received
[2025-05-18 04:00:00,005: INFO/ForkPoolWorker-12] Task celery.backend_cleanup[cff83f3d-7392-48a1-abff-cc95d78080bb] succeeded in 0.0018318749498575926s: None
[...no more waking up...]

Today the same:

[...lot's of waking up messages...]
[2025-05-19 03:59:20,957: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:25,959: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:30,960: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:35,962: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:40,963: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:45,965: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:50,967: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[2025-05-19 03:59:55,968: DEBUG/MainProcess] beat: Waking up in 4.02 seconds.
[2025-05-19 04:00:00,002: INFO/MainProcess] Scheduler: Sending due task celery.backend_cleanup (celery.backend_cleanup)
[2025-05-19 04:00:00,013: DEBUG/MainProcess] celery.backend_cleanup sent. id->01df024d-bd66-4174-82b8-4750d59d344f
[2025-05-19 04:00:00,013: INFO/MainProcess] Task celery.backend_cleanup[01df024d-bd66-4174-82b8-4750d59d344f] received
[2025-05-19 04:00:00,016: INFO/ForkPoolWorker-12] Task celery.backend_cleanup[01df024d-bd66-4174-82b8-4750d59d344f] succeeded in 0.0016798160504549742s: None
[...no more waking up...]

So more or less, the backend_cleanup task kills the beat scheduler for me.

bbilly1 avatar May 19 '25 02:05 bbilly1

I've manually reset the scheduled tasks as stated here https://django-celery-beat.readthedocs.io/en/latest/#important-warning-about-time-zones - it seems that it is running correctly now.

Packages:

  • django 5.2.1
  • celery 5.5.2
  • django-celery-beat 2.8.1
  • click 8.2.0
  • redis 6.1.0 (using as result backend not broker)
  • django-redis 5.4.0

Insights:

  • Ubuntu Server running docker containers
  • Using a dedicated beat container instead of before using worker with -B option
  • Timezone on Ubuntu Server -> UTC
  • Timezone in Docker Containers -> UTC
  • Timezone inside django shell (datetime) -> Europe/Berlin (Settings USE_TZ = True, Timezone = Europe/Berlin)
  • Timezone with django.utils.timezone -> UTC

My Tasks are configured with Europe/Berlin Timezones

FabianClemenz avatar May 19 '25 07:05 FabianClemenz