machinery When a worker pod is killed, no mechanism for retrying task

When a worker pod is killed, no mechanism for retrying task

Open NikBisht opened this issue 2 years ago • 3 comments

We're using Redis broker + DyanmoDB backend, and we've noticed that when a worker pod is terminated (ungracefully) and the task was still running, the task stays in STARTED state. It seems as though Machinery doesn't have a timeout at which point it we re-queue tasks that have been in STARTED state for a long period of time. This seems like a critical feature for fault tolerance.

May 11 '22 20:05 NikBisht

I face the same issue here.

Aug 29 '22 11:08 taylorzhangyx

I face the same issue here.

Sep 23 '22 09:09 zhouhui521

Do we have any updates or workarounds against this?

Nov 18 '22 11:11 kushalhalder

machinery machinery copied to clipboard

When a worker pod is killed, no mechanism for retrying task

machinery
machinery copied to clipboard