yii2-queue
yii2-queue copied to clipboard
Infinite retry loop in RetryableJob because the canRetry/attempt not obeyed when Job/Worker segfaults
What steps will reproduce the problem?
I am working on getting this info. It happens on a live system with a few thousand jobs per day where a few hundred segfault and get re-queued indefinitely.
The job implements \yii\queue\RetryableJobInterface and has: public function canRetry($attempt, $error) { return ($attempt < 3 ) && ($error instanceof TemporaryException); }
What's expected?
Not sure if the segfault is a Queue issue, but at least the "Attempts" mechanism should work so we do not end up in an infinite race... a job should really not be retried more than twice, but I get the attempt counter (in the logs) up to 400+ (then I have to flush the queue to stop this).
What do you get instead?
Infinite re-queuing. The segfault must happen in a very awkward place in between the attempt counter being increased and canRetry call...
Additional info
Using Redis queue.
| Q | A |
|---|---|
| Yii version | 2.0.27 |
| PHP version | v7.0.33-0+deb9u5 |
| Operating system | Linux 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3 (2019-09-02) x86_64 GNU/Linux |
A lot of jobs are left in the reserved state, which is also where the attempt counter is incremented via hincrby in the redis driver. I believe these to be all the jobs that have segfaulted, but then get re-run.
It seems that the segfault is occurring after the job finishes (at the garbage collecting stage) in the Zend memory manager. Similar to documented bugs like https://bugs.php.net/bug.php?id=71662
Switching off the Zend_MM with USE_ZEND_ALLOC=0 stops the segfaults.
The question that remains is if the queue manager can deal with a segfault in the job and behave as expected in terms of queue/attempt management?
No, it can't. Segfault can't be caught.
I don't think the segfault needs to be caught. My thoughts are more along the line of adjusting the attempt increment/retry logic (so there is a safeguard before the job runs not after).
Do you have an idea about implementation?
I'll have a look