kombu Tasks lost in redis

My original question was posted in celery, but I believe this to be specifically related to kombu and the redis transport.

I have an issue where tasks (scheduled by celery to run in the future) are not being run. I believe the issue is that the messages in redis are being dropped. For example, I saw a message in my logs: 09a20a96-0c1e-478f-b620-4c9404e3c2fc sent to queue.

Upon looking in the unacked HASH in redis, I can see a message with "correlation_id": "09a20a96-0c1e-478f-b620-4c9404e3c2fc" exists - everything is good.

Later in the day however, looking in the same unacked HASH, I no longer see the message for 09a20a96-0c1e-478f-b620-4c9404e3c2fc.

It should be mentioned that during the day, the celery workers do get SIGTERM'd and I see that several unacknowledged messages are restored into the celery SET in redis.

I believe the issue lies within the time period of celery being shut down and having the messages from unacked rewritten to celery. I see two points of failure here:

The messages from unacked -> celery are never committed
The messages from unacked -> celery are committed cleanly but when celery boots up the messages from celery -> unacked are not committed.

BTW I am seeing WorkerLostError and celery.concurrency.asynpool in verify_process_alive errors in my logs but I'm not sure if they're related to my question here.

Jan 16 '14 22:01 scott-coates

It is a known issue that the worker may lose up to one message if abruptly terminated.

With ack emulation it will reserve one message and then add it to the backup hash, but these operations are in different transactions. I don't think think it can lose more than one message and it will only do so if 1) shutdown does not complete, or 2) the redis server go offline before the second operation completes.

I imagine there is a solution to this problem, but I don't have much time to work on the redis transport. There is a command in the redis API that is designed for this problem (http://redis.io/commands/rpoplpush), but it's useless for us as it does not let us consume from multiple keys.

Jan 17 '14 16:01 ask

I hope this bug will be fixed soon... Any long-running task will be rescheduled to run again if the worker gets shutdown signal before the task finishes. After the graceful shutdown the task goes back to the queue because it was not acknowledged...

Mar 26 '17 16:03 dejlek

may I know the updates of this issue?

Sep 12 '21 04:09 auvipy

After a quick investigation, it seems like this can be fixed using LMPOP which also has a blocking version BLMPOP which are both available in Redis 7.0 and above.

Sep 13 '21 12:09 thedrow

kombu kombu copied to clipboard

Tasks lost in redis

kombu
kombu copied to clipboard