sidekiq icon indicating copy to clipboard operation
sidekiq copied to clipboard

Add ability to access process identity within the job

Open heaven opened this issue 1 year ago • 2 comments
trafficstars

Hi, I know the jobs have to be fast, idempotent, and don't rely on any state but the reality isn't always as expected. We have jobs that may run for quite a bit, which leads to jobs being returned to the queue (e.g. during deployment). We also need to make sure only one job processes a task at a time and for this reason, we set the task status to in_progress once it is started. And if another job starts this same task, it first locks the task in the database and checks its status, if it is in_progress then no action is taken.

The problem is, if the job hasn't finished, the task remains in progress as there is no mechanism to notify the workers to stop when Sidekiq moves into the "quiet" mode. And when this same job is restarted later it sees the task is in-progress and quits. We solve this by saving the jid into the task and comparing it with the jid within the job, if it is the same we allow the job to proceed despite the in-progress status.

But sometimes the process also crashes due to other reasons we can't control, and in that moment we are in trouble as the jobs are sometimes gone forever and the tasks stuck forever. In addition to jid, we would also like to store the identity of the process that started the task. Then we can have a scheduled job that would recover such orphaned tasks by simply checking the presence of the process. Once the process crashes, its identity in Redis expires after some short time and the task can be recovered.

Is that something you'd be willing to add to Sidekiq?

heaven avatar Mar 06 '24 22:03 heaven

That's a good summary of Sidekiq Pro's super_fetch. identity is only available within each Sidekiq process, it's not available in the client pushing the job to Redis.

As for your in_progress status, I would use a database row lock instead. If your process dies, the lock should go away quickly.

https://www.postgresql.org/docs/current/explicit-locking.html

mperham avatar Mar 07 '24 16:03 mperham

Indeed, we use this approach often but in this case, the lock will interfere with other processes and impact the performance. Plus locks are available within transactions, which brings other limitations.

Tracking the identity of the responsible process seemed the easiest fix to this problem, as in this case we only lock the record for a short time to change its status and save the jid and identity. We do this when starting the job, thus the identity should be known.

If forwarding the identity is too much pain, then probably having a separate table where we can store the task_id would do the trick, so we can lock it there not interfering with other processes and with the web app, which may also want to access the record.

heaven avatar Mar 07 '24 18:03 heaven