home icon indicating copy to clipboard operation
home copied to clipboard

High-availability real-time Celery monitoring

Open utterances-bot opened this issue 2 years ago • 3 comments

High-availability real-time Celery monitoring - Alex Pearce

A guide on creating multiple Celery event receivers for highly available real-time worker and task monitoring.

https://alexpearce.me/2022/07/high-availability-celery-monitoring/

utterances-bot avatar Dec 12 '22 22:12 utterances-bot

I have a celery applications that runs in a Docker container and a Celery worker that runs in a separate Docker container. The events receiver is written in a similar way to what you have above but while listening for events, I get a "socket.timeout: timed out" message.

Are there any ports I need to expose? Is there something I am missing?

behm avatar Dec 12 '22 22:12 behm

This message is emitted from the app.events.Receiver receiver? Hmm. What broker are you using? Are the application and worker able to connect to the broker?

alexpearce avatar Dec 13 '22 15:12 alexpearce

Alex, I figured out my problem. Something dumb on my part in the way I was sharing code between the containers. I did have another question though.

We have seen cases where workers are killed unexpectedly (mostly out-of-memory) and we lose a task. I am trying to add an "audit check" to our system that looks through our database of "jobs" to see if incomplete tasks in a job are accounted for in Celery.

Once you receive an event into the EventReceiver, does it get stored anywhere? I basically want to ask Celery if it knows anything about a specific task_id and if not, I would "re-queue" it. I have tried to use AsyncResult but I always get a status of Pending no matter what state the task is in.

Thanks, Brian

behm avatar Dec 19 '22 22:12 behm