redbeat icon indicating copy to clipboard operation
redbeat copied to clipboard

Latest Honest Status

Open ja-gooding opened this issue 2 years ago • 3 comments

What are the honest truths of the state of this project and its unit-tested capabilities for HA?

As far as I am concerned, "django-celery-beat" is a lost cause. It has had issues with basic functionality, especially High Availability, since 2013 and even earlier: https://github.com/celery/celery/issues/1495#issuecomment-374779390.

It will never be fixed, and I am unsure it can be improved because I'm not sure it was adequately engineered to be done so; from the start.

I am looking for a project that will allow me to schedule periodic tasks in the Django ORM with second and millisecond accuracy just like with "django-celery-beat" while also maintaining some modern ability to have high availability (HA) / fault tolerance optionally used in the event the primary beat mechanism suffers a failure / needs to move to a different node (i.e., docker swarm / k8s pods, etc.) for whatever reason (software, hardware, maintenance, disaster recovery, etc.)

Please let me know if this isn't the right project or if one does not exist.

ja-gooding avatar Sep 18 '22 22:09 ja-gooding

Per the license, https://github.com/sibson/redbeat/blob/main/LICENSE#L145, this software is provided "AS IS". I personally don't use the cluster mode and the support was added by others. I can't comment on how they are using it, so can only assume it's meeting their needs. HA design is complex, both your application and infrastructure needs to be designed to achieve HA across the scenarios you identity. I'd suggest reviewing the code for yourself to determine if it meets your needs. If you discover issues and are able to provide reasonable patches I will integrate them.

sibson avatar Sep 23 '22 00:09 sibson

Thank you for the response. I’m an academic researcher and will get back to you on closing this issue out after a more formal evaluation.

ja-gooding avatar Sep 23 '22 00:09 ja-gooding

Hi! I've been running redbeat as a non-critical task scheduler in production for about a year now, and we found redbeat to be generally capable and stable. We were able to scale up to tens of thousands of tasks, with 100+ tasks dispatched every second off a 1-core instance, with a stand-by beat instance waiting on locks in case the main beat instance fails.

  • We aren't using clustered redis for simplicity's sake, but we had a way to automatically refill the tasks if the redis instance got restarted, and it got us most of the way towards high availability despite the single redis instance
  • If you have lots of task dispatches, monitor how long each tick takes and tune the redbeat_lock_timeout parameter accordingly, so that your instance can get through all the tasks within the timeout

And, millisecond accuracy is likely a myth in celery, since by the time worker receives the task and starts execution, 10s of ms would have likely gone by

chenseanxy avatar Sep 23 '22 14:09 chenseanxy