bullmq icon indicating copy to clipboard operation
bullmq copied to clipboard

[Feature] Do not allow removing a delayed job that is part of a repeatable meta job.

Open manast opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? Please describe. If a delayed job that is representing the next iteration in a repeatable job gets deleted by mistake, the repeatable job will stop repeating with no other solution than re-adding the repeatable job again.

Describe the solution you'd like An exception or error should be raised if we try to remove a delayed job using the method for removing jobs or any other that could indirectly remove a delayed job. The only way to remove a delayed job that is part of a repeatable job should be by removing the repeatable job alltogeher.

Describe alternatives you've considered None

Additional context A user reported that their repeatable jobs stopped working and the delayed job associated to the repeatable job was missing. We should make sure that this does not happen by a mistake of the user in any case.

manast avatar Sep 28 '24 08:09 manast

I was able to reproduce the problem, and I’m sharing the steps I followed below as I think they might help with the solution:

  1. A repeated job was created. repeated-job-added
  2. The job was successfully completed 3 times. final-result-in-task-force
  3. For a reason we don’t understand, after the 3rd job was completed, a ‘drained’ event occurred. As a result, the repeated job in ‘delayed’ is being deleted. Count of completed jobs changing sometimes it completes 6-10 jobs, but at the end removes repeated job. stream-logs-drained

Additionally, the situation on the Redis key side is as follows: final-result-in-redis

erenkurnaz avatar Oct 01 '24 15:10 erenkurnaz

I wonder, what is the settings you are using for this repeatable job, like what cron expression or every interval?

manast avatar Oct 01 '24 20:10 manast

btw, the "drained" event is just triggered when there are no jobs left in the queue, it is not the reason for the next delayed job to disappear or to not be created. Furthermore, the next iteration delayed job is created before the current job starts processing, so it should not matter what happens to the job that is processed, the next delayed job should be there. It would be interesting to see what repeat options you are using to see if we can spot something there.

manast avatar Oct 01 '24 21:10 manast

Here's a sample repeat job. We use a different Redis for the sandbox. We were testing with just one job, so there wasn't anything in the delayed tab. Maybe that's why it was drained, but I still can't figure out why a repeatedJob would be drained when there's no delayed job.

{
  "attempts": 0,
  "delay": 59999,
  "prevMillis": 1727846100000,
  "timestamp": 1727846095466,
  "repeat": {
    "offset": 55465,
    "key": "xx_66b4e50636054e50b75ba8f9",
    "every": 60000,
    "count": 2914
  },
  "removeOnFail": {
    "count": 100
  },
  "jobId": "repeat:xx_66b4e50636054e50b75ba8f9:1727846100000",
  "removeOnComplete": {
    "count": 100
  }
}

hamzauzumcu avatar Oct 02 '24 05:10 hamzauzumcu

By any chance, can you reproduce this problem locally?

manast avatar Oct 02 '24 07:10 manast

I also wonder if you are doing things like updating repeatable jobs options or data.

manast avatar Oct 02 '24 08:10 manast

We have refactored and improved repeatable jobs into what we now call "Job Schedulers". They work the same as before but the API is cleaner and more robust. Also we have added guardrails so that you cannot easily remove a delayed job that belongs to a job scheduler by mistake. Here is the new documentation: https://docs.bullmq.io/guide/job-schedulers

I recommend you to upgrade to these new methods, and lets see if you suffer from this issue again after this.

manast avatar Oct 06 '24 09:10 manast

This is now completed. If you still have an issue lets open a new issue as this feature of not allowing delayed jobs that are part of job scheduler is now ready.

manast avatar Oct 07 '24 19:10 manast