Celery Beat silently stops working after period of time, without error
Summary:
Celery Beat silently fails after an unpredictable amount of time running on Digital Ocean App Platform, meaning tasks are no longer executed. There are no obvious indications in the logs.
My setup:
- Django 5.1.2
- Celery-Beat Version: 2.70
- Celery Version 5.4.0
- Redis 7
- Postgres 16
Exact steps to reproduce the issue:
- Deploy Celery Beat to Digital Ocean App Platform as part of a Django app
- Configure scheduled task via database scheduler
- Leave Celery Beat running
- After hours/days, tasks stop being run
Detailed information
I'm running Celery Beat on Digital Ocean App Platform (Docker based deployments via build packs) via the command:
celery -A config.celery_app beat -l debug --scheduler django_celery_beat.schedulers:DatabaseScheduler
After an unpredictable amount of time (usually days) Celery Beat will stop running tasks. There are no console errors and the process doesn't crash, Celery simply stops running scheduled tasks silently. If I redeploy the app, tasks will resume.
Having enabled debug logging, the final lines in the log are:
[celery-beat] [2024-11-13 00:49:27] [2024-11-13 00:49:27,236: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:32] [2024-11-13 00:49:32,237: DEBUG/MainProcess] beat: Synchronizing schedule...
[celery-beat] [2024-11-13 00:49:32] [2024-11-13 00:49:32,238: DEBUG/MainProcess] Writing entries...
[celery-beat] [2024-11-13 00:49:32] [2024-11-13 00:49:32,278: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:37] [2024-11-13 00:49:37,311: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:42] [2024-11-13 00:49:42,348: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:47] [2024-11-13 00:49:47,381: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:52] [2024-11-13 00:49:52,415: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
[celery-beat] [2024-11-13 00:49:57] [2024-11-13 00:49:57,447: DEBUG/MainProcess] beat: Waking up in 2.54 seconds.
[celery-beat] [2024-11-13 00:50:00] [2024-11-13 00:50:00,068: INFO/MainProcess] Scheduler: Sending due task Celery uptime heartbeat (fetch.utils.tasks.celery_uptime_heartbeat)
[celery-beat] [2024-11-13 00:50:00] [2024-11-13 00:50:00,091: DEBUG/MainProcess] fetch.utils.tasks.celery_uptime_heartbeat sent. id->657d0fd4-222a-474f-9c8f-13142909c69b
The final line here is the running of a scheduled task that I'm using to send heartbeats to an uptime monitor. I set up this task to help diagnose the issue and track when it occurs - there's nothing wrong with this task specifically.
- I've looked at the metrics and there is no issue with resources (there is enough RAM and CPU).
- I'm running a similar setup in a separate project (Digital Ocean App Platform) which doesn't have this issue (same Celery Beat versions)
I'm unsure of how I can further investigate the issue
Hi, I'm encountering the same issue while running the following setup on AWS:
- django 5.1.4
- django-celery-beat 2.7.0
- redis 7.0.5
- postgres 14.3
Has anyone experienced this before or have any recommendations for troubleshooting?
Thanks!
Do you happen to be using connection pooling? I was using psycopg[pool]. By removing [pool] I was able to stop the issue. Not sure exactly why it was happening.
Thank you! I’m currently using psycopg2==2.9.10 without the [pool] option. I’ll try testing with some different database settings to see if that helps.
any update on the root of this issue?
psycopg2
how did you resolve this?
I have the same problem. I use
celery==5.5.3
hiredis==3.2.1
redis==6.2.0
Django==5.2.3
As redis server I use Valkey 8.0.
I start Celery with the command celery -A myapp beat -l INFO.
Today, after a little more than 12 days, Celery Beat is also no longer sending tasks.
I am running Celery Beat in a Kubernetes deployment. When I restart the pod, Celery Beat sends the tasks again.
The same issue with:
- celery==5.5.3
- django-celery-beat==2.5.0
It just stuck without any errors:
...
[2025-06-25 02:00:00,137: INFO/MainProcess] Scheduler: Sending due task TASK_NAME
I have the exact same issue, felt like I could have written this post!!! Has anyone figured this out yet? I've been trying to fix this for months with no luck :(
I encountered the same issue using DoctorDroid with:
celery==5.5.3django-celery-beat==2.4.0
The problem is that celery-beat silently stops working without raising any exceptions or emitting logs. I suspect two potential root causes:
- Network-related interruptions.
- Timezone or clock synchronization issues between nodes (my deployment is on a self-managed Kubernetes cluster in a private cloud).
To isolate the issue, I tested scenarios when celery-beat lost connection to database. I manually disconnecting celery-beat from Redis and PostgreSQL. In those cases, exceptions were logged to celery-beat log file, and celery-beat attempted to reconnect once the database became alive again.
However in this case, celery-beat simply stops scheduling without any errors or logs
Our temporary solution is implementing a sidecar container that continuously monitors the celery-beat log file. If no new logs are detected over an interval (10s, 20s, ...), it restarts the celery-beat process.
I don't know why it worked, but it did when I rolled back to the previous service build. I think it's related to library versions. These versions work for me:
- celery==5.4.0
- django-celery-beat==2.5.0
- redis==5.2.1
- Django==4.2.19
I encountered the same issue using DoctorDroid with:
celery==5.5.3django-celery-beat==2.4.0The problem is that
celery-beatsilently stops working without raising any exceptions or emitting logs. I suspect two potential root causes:
- Network-related interruptions.
- Timezone or clock synchronization issues between nodes (my deployment is on a self-managed Kubernetes cluster in a private cloud).
To isolate the issue, I tested scenarios when
celery-beatlost connection to database. I manually disconnectingcelery-beatfrom Redis and PostgreSQL. In those cases, exceptions were logged tocelery-beatlog file, andcelery-beatattempted to reconnect once the database became alive again.However in this case,
celery-beatsimply stops scheduling without any errors or logsOur temporary solution is implementing a sidecar container that continuously monitors the
celery-beatlog file. If no new logs are detected over an interval (10s, 20s, ...), it restarts thecelery-beatprocess.
Good idea! Could you share the sidecar implementation?
I have the same problem:
celery=5.5.2
redis=6.1.0
django=5.1.5
I start Celery with the command celery -A myapp beat -l INFO.
I'm having the same problem with library versions already reported in this thread. I'm using Python 3.9, could this issue be related to an interaction between the libraries and a specific Python version?
Just to note that although removing the pool option seemed to help, I'm still occasionally having the same issue on only one of my environments. I have multiple environments deployed (prod, staging,...) and only one is affected leading me to believe this is some sort of issue with the underlying Docker VM and versions.
Also, to potentially help others, I created a Celery task on the host itself to send a heartbeat to UptimeRobot every 5 minutes. This helps me see when celery-beat has stopped, after which I manually restart. Not ideal but helps with debugging
I posted here recently with the same issue and since then I found a solution to my problem. In my case (also using Digital Ocean app platform) the issue was related to how the Redis managed database would handle idle connections. When the idle connections would be removed but celery would try and store a backend result using this connection, celery beat would silently fail and become corrupted.
The solution was to disable backend results on celery settings (make sure you also remove any environment variables for celery backend results). This immediately fixed my issue.
I can add the specific settings later tonight when I’m back to my computer. Hope this helps!
I posted here recently with the same issue and since then I found a solution to my problem. In my case (also using Digital Ocean app platform) the issue was related to how the Redis managed database would handle idle connections. When the idle connections would be removed but celery would try and store a backend result using this connection, celery beat would silently fail and become corrupted.
The solution was to disable backend results on celery settings (make sure you also remove any environment variables for celery backend results). This immediately fixed my issue.
I can add the specific settings later tonight when I’m back to my computer. Hope this helps!
Interesting, I hadn't considered Redis being the issue (I'm also using DO hosted Redis). If you could share the config later that would be great. Not sure why I'm only experiencing it on one environment. I'll have to check if I have separate Redis configs for prod vs staging. Thanks for the update.
HEre are the specifics on my fix
The issue causing the tasks to stop running seems to have been related to how Digital Ocean managed databases deal with idle connections. So since I was using Redis for Cache (database 0); for my Celery broker (database 1) and my Celery backend results (database 2). This all worked fine until some idle connections where closed and then Celery would try and access them again to write the backend result. This would somehow put the Celery Beat sscheduler into a corrupted state that would make it stop sending new tasks to Celery.
Solution:
Since I'm not using tasks in a way that I actually need the results kept, I completely disabled Results on Celery settings. This involved updating the Django Settigs to
CELERY_RESULT_BACKEND = None
CELERY_TASK_IGNORE_RESULT = True
Also I removed the Enviroment variable from Digital Ocean to make sure that backend was disabled. When starting up Celery it should look something like this:
transport: redis://redis:6379/0 results: disabled://
I have a similar issue and can't find anything about it. I'm running celery + celery beat and a redis on a VPS. Beat is set to 10 seconds and stops working after roughly 60 hours.
Having a similar issue. I'm running celery tasks on AWS and sometimes the tasks just get missed without errors. This is because they are not getting scheduled by celery-beat at all
Same here. Hangs randomly, sometimes after 1 hour, sometimes after 2 days. Remedied by restarting container every 30 mins.
python = "^3.13"
Django = "^5.2.5"
celery = "^5.5.3"
django-celery-beat = "^2.8.1"
CELERY_BEAT_SCHEDULER = "django_celery_beat.schedulers:DatabaseScheduler"
CELERY_RESULT_EXTENDED = True
DbScheduler, redis result backend. DB with pool and psycopg3.
Tried this, without success:
app.conf.broker_pool_limit = 0
app.conf.broker_channel_error_retry = True
Same here. Hangs randomly, sometimes after 1 hour, sometimes after 2 days. Remedied by restarting container every 30 mins.
python = "^3.13" Django = "^5.2.5" celery = "^5.5.3" django-celery-beat = "^2.8.1"CELERY_BEAT_SCHEDULER = "django_celery_beat.schedulers:DatabaseScheduler" CELERY_RESULT_EXTENDED = TrueDbScheduler, redis result backend. DB with pool and psycopg3.
Tried this, without success:
app.conf.broker_pool_limit = 0 app.conf.broker_channel_error_retry = True
Did you try disabling the Results Backend as mentioned in message above?
I need results backend. Will try to change it to DB. But redis is most optimal for my use case because of lots of small tasks.
Tried using DB result backend and redis broker instead of rabbitmq - problem remains.
HEre are the specifics on my fix
The issue causing the tasks to stop running seems to have been related to how Digital Ocean managed databases deal with idle connections. So since I was using Redis for Cache (database 0); for my Celery broker (database 1) and my Celery backend results (database 2). This all worked fine until some idle connections where closed and then Celery would try and access them again to write the backend result. This would somehow put the Celery Beat sscheduler into a corrupted state that would make it stop sending new tasks to Celery.
Solution:
Since I'm not using tasks in a way that I actually need the results kept, I completely disabled Results on Celery settings. This involved updating the Django Settigs to
CELERY_RESULT_BACKEND = None CELERY_TASK_IGNORE_RESULT = True
Also I removed the Enviroment variable from Digital Ocean to make sure that backend was disabled. When starting up Celery it should look something like this:
transport: redis://redis:6379/0 results: disabled://
Thanks @jeparalta I changed the configuration as you suggested six days ago and haven't had any problems since. So that seems to fix the problem for now. However, if the result backend is required, the solution doesn't work.
I'm surprised that celery beat doesn't throw an error. That would be best, because then you could automatically restart the process/pod.
HEre are the specifics on my fix The issue causing the tasks to stop running seems to have been related to how Digital Ocean managed databases deal with idle connections. So since I was using Redis for Cache (database 0); for my Celery broker (database 1) and my Celery backend results (database 2). This all worked fine until some idle connections where closed and then Celery would try and access them again to write the backend result. This would somehow put the Celery Beat sscheduler into a corrupted state that would make it stop sending new tasks to Celery. Solution: Since I'm not using tasks in a way that I actually need the results kept, I completely disabled Results on Celery settings. This involved updating the Django Settigs to CELERY_RESULT_BACKEND = None CELERY_TASK_IGNORE_RESULT = True Also I removed the Enviroment variable from Digital Ocean to make sure that backend was disabled. When starting up Celery it should look something like this: transport: redis://redis:6379/0 results: disabled://
Thanks @jeparalta I changed the configuration as you suggested six days ago and haven't had any problems since. So that seems to fix the problem for now. However, if the result backend is required, the solution doesn't work.
I'm surprised that celery beat doesn't throw an error. That would be best, because then you could automatically restart the process/pod.
@bast-ii Iguess if you need backend results you could still have them on another service? maybe outside of DO app platform on a droplet or something.. Im not really sure, as I havent looked into this but could be a possibility?
I need results backend. Will try to change it to DB. But redis is most optimal for my use case because of lots of small tasks.
@Object905 could you not use a separate Redis instance for the results?
HEre are the specifics on my fix The issue causing the tasks to stop running seems to have been related to how Digital Ocean managed databases deal with idle connections. So since I was using Redis for Cache (database 0); for my Celery broker (database 1) and my Celery backend results (database 2). This all worked fine until some idle connections where closed and then Celery would try and access them again to write the backend result. This would somehow put the Celery Beat sscheduler into a corrupted state that would make it stop sending new tasks to Celery. Solution: Since I'm not using tasks in a way that I actually need the results kept, I completely disabled Results on Celery settings. This involved updating the Django Settigs to CELERY_RESULT_BACKEND = None CELERY_TASK_IGNORE_RESULT = True Also I removed the Enviroment variable from Digital Ocean to make sure that backend was disabled. When starting up Celery it should look something like this: transport: redis://redis:6379/0 results: disabled://
Thanks @jeparalta I changed the configuration as you suggested six days ago and haven't had any problems since. So that seems to fix the problem for now. However, if the result backend is required, the solution doesn't work. I'm surprised that celery beat doesn't throw an error. That would be best, because then you could automatically restart the process/pod.
@bast-ii Iguess if you need backend results you could still have them on another service? maybe outside of DO app platform on a droplet or something.. Im not really sure, as I havent looked into this but could be a possibility?
I don't need the backend results, so I don't know if that would work. Have you tried to find out what the problem is? Why does it work without result backend?
I need results backend. Will try to change it to DB. But redis is most optimal for my use case because of lots of small tasks.
@Object905 could you not use a separate Redis instance for the results?
Right now I'm using django-db results backend and the problem remains, so I don't think that separate instance will solve this.
Maybe this issue exists within the wrong repository, because we also run into that situation and not using django-celery-beat package for legacy reason. We only use multiple celery workers with a beat instance and shelve storage in django 4.2
And redis 8 as broker and result backend.
One information: Before we update, we run django==3.2 and celery==5.4 with redis broker in version 7 and do not have any problems.
Our versions:
django==4.2.24 django-redis==6.0.0 celery==5.5.3 redis==6.2.0 psycopg2-binary==2.9.10
Broker is a self hosted redis 8.
To find some commonalities: Everything runs within docker 24.0.2 containers withou a VMWare ESX VM with CentOS .
But we found one way to reproduce this situation (in our situation): When we simple hard restart redis, then beat still execute heartbeats, but no more scheduled tasks (logs), without any errors on "debug" log level.
[2025-09-30 13:19:24,650: DEBUG/MainProcess] {"message": "Server heartbeat succeeded", "topologyId": {"$oid": "68dbbc6c77a29039536a25ae"}, "driverConnectionId": 1, "serverConnectionId": 1661240, "serverHost": "db", "reply": "{\"isWritablePrimary\": true, \"topologyVersion\": {\"processId\": {\"$oid\": \"68b04764690239bf27525385\"}}, \"maxBsonObjectSize\": 16777216, \"maxMessageSizeBytes\": 48000000, \"maxWriteBatchSize\": 100000, \"localTime\": {\"$date\": \"2025-09-30T11:19:24.650Z\"}, \"logicalSessionTimeoutMinutes\": 30, \"connectionId\": 1661240, \"maxWireVersion\": 21, \"ok\": 1.0}"}
[2025-09-30 13:19:24,651: DEBUG/MainProcess] {"message": "Server heartbeat started", "topologyId": {"$oid": "68dbbc6c77a29039536a25ae"}, "driverConnectionId": 1, "serverConnectionId": 1661240, "serverHost": "db", "awaited": true}
[2025-09-30 13:19:34,661: DEBUG/MainProcess] {"message": "Server heartbeat succeeded", "topologyId": {"$oid": "68dbbc6c77a29039536a25ae"}, "driverConnectionId": 1, "serverConnectionId": 1661240, "serverHost": "db", "reply": "{\"isWritablePrimary\": true, \"topologyVersion\": {\"processId\": {\"$oid\": \"68b04764690239bf27525385\"}}, \"maxBsonObjectSize\": 16777216, \"maxMessageSizeBytes\": 48000000, \"maxWriteBatchSize\": 100000, \"localTime\": {\"$date\": \"2025-09-30T11:19:34.660Z\"}, \"logicalSessionTimeoutMinutes\": 30, \"connectionId\": 1661240, \"maxWireVersion\": 21, \"ok\": 1.0}"}
[2025-09-30 13:19:34,661: DEBUG/MainProcess] {"message": "Server heartbeat started", "topologyId": {"$oid": "68dbbc6c77a29039536a25ae"}, "driverConnectionId": 1, "serverConnectionId": 1661240, "serverHost": "db", "awaited": true}
When log level is not "debug", then this log will stay on last scheduled task, before we restarted redis. I would expect some type of "connection error" or so on.