goworker icon indicating copy to clipboard operation
goworker copied to clipboard

Retry failed jobs

Open gravis opened this issue 10 years ago • 5 comments

It would be nice to have features like sidekiq provides (https://github.com/mperham/sidekiq/wiki/Error-Handling), especially retry failed jobs. Something like:

"If you don't fix the bug within 25 retries (about 21 days), Sidekiq will stop retrying and move your job to the Dead Job Queue. You can fix the bug and retry the job manually anytime within the next 6 months using the Web UI."

gravis avatar Sep 03 '14 08:09 gravis

Doesn't Resque do this though?

Or do a rescue in Go?

cdrage avatar Mar 09 '15 20:03 cdrage

Hi, I am experimenting with the goworker library. I have a requirement of stopping and starting jobs.

Is it possible with the current version? Can anyone tell any workaround for it?

rohit4813 avatar Nov 06 '17 05:11 rohit4813

@rohit4813 The current implementation listens for few signals and if it receives them, it stops enqueuing new jobs but lets the running jobs finish.

I'm not sure I understand exactly what you're trying to do, but here's our use: We have a scenario where each job is potentially pretty long but has a natural stopping point. For this, we create a channel in the main function that gets written into when a signal is received (basically he same code that is in goworker already). Then we create another channel, this time buffered (capacity = number of workers) and pass that channel to each worker. Workers then select from that channel at natural stopping points. In a separate goroutine (kicked off from the main function), we read from the signals channel and write N times (= number of workers) to the channel.

The whole flow looks like:

  1. The process receives a signal
  2. Both our signals channel and goworker's signals channels are written into
  3. Goworker stops enqueuing new jobs
  4. We copy the event N times
  5. When any worker finishes, it's handled normally
  6. When a worker gets to a checkpoint where it checks the channel, it returns and goworker takes care of it

mingan avatar Nov 06 '17 07:11 mingan

@Mingan Thanks for the great explanation.

If I understand correctly, all the workers will read from the workers channel(which gets populated from the signals channel)?

And I have a use case where I want to stop single/multiple worker(s), say which are running for a very long time and if that is the case all the workers will stop on passing the signal to the channel.

I can identify the worker on which the job is running for a very long time. How can I send the signal to this particular worker?

Hope this is not confusing, or am I missing something.

rohit4813 avatar Nov 06 '17 08:11 rohit4813

@rohit4813 Our use case just creates breakpoints in long-running jobs so that when we need to restart the process, we don't have to wait (tens of) minutes for the whole job to finish.

If you needed to discriminate between workers, I guess you could do that by sending some meaningful value through the channel and then the worker would decide "this msg is meant for me, I'll stop" or "this is meant for the slow one over there, I can keep running". Though, I can't imagine the use case for such behaviour.

mingan avatar Nov 06 '17 09:11 mingan