Laravel-Queue-Monitor icon indicating copy to clipboard operation
Laravel-Queue-Monitor copied to clipboard

Job stuck on Running

Open ldmuniz opened this issue 3 years ago • 7 comments

Sometimes, when the job hit the timeout, another attempt is created and the original Job stucks on "Running". I suggest a little change in jobStarted function at romanzipp\QueueMonitor\Services\QueueMonitor.php

Original:

protected static function jobStarted(JobContract $job): void
    {
        if ( ! self::shouldBeMonitored($job)) {
            return;
        }

        $now = Carbon::now();

        $model = self::getModel();

        $model::query()->create(['job_id' => self::getJobId($job)],[            
            'name' => $job->resolveName(),
            'queue' => $job->getQueue(),
            'started_at' => $now,
            'started_at_exact' => $now->format(self::TIMESTAMP_EXACT_FORMAT),
            'attempt' => $job->attempts(),
        ]);
    }

Suggestion:

    protected static function jobStarted(JobContract $job): void
    {
        if ( ! self::shouldBeMonitored($job)) {
            return;
        }

        $now = Carbon::now();

        $model = self::getModel();

        $model::query()->updateOrCreate(['job_id' => self::getJobId($job)],[            
            'name' => $job->resolveName(),
            'queue' => $job->getQueue(),
            'started_at' => $now,
            'started_at_exact' => $now->format(self::TIMESTAMP_EXACT_FORMAT),
            'attempt' => $job->attempts(),
        ]);
    }

ldmuniz avatar Jan 21 '21 19:01 ldmuniz

Sadly I can not reproduce this issue.

When executing the following test job which takes 30 seconds to run but specifying --timeout=5 with the artisan queue:work command, the Queue monitor will correctly mark the job as failed.

Maybe the job is hitting your configured max_execution_time threshold? In this case it's not possible to mark the previous attempt as failed since the PHP process will be killed immediately. The following attempt will also generate a new job_id

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use romanzipp\QueueMonitor\Traits\IsMonitored;

class TestJob implements ShouldQueue
{
    use Dispatchable;
    use InteractsWithQueue;
    use IsMonitored;
    use Queueable;
    use SerializesModels;
    use IsMonitored;

    public function handle()
    {
        sleep(30);
    }

    public function failed()
    {
        logger('TIMEOUT OCCURED');
    }
}

Screen Shot 2021-01-22 at 08 01 22

romanzipp avatar Jan 22 '21 07:01 romanzipp

Same happened to me, I have cases when a job would stop running (I think it is the OS killing the process, still looking into it) and the job would be stuck as running forever

javis avatar Mar 02 '21 16:03 javis

There another possible reason that keep some jobs stuck on running and is when jobs are distributed across multiple Laravel workers on different servers (distributed system).

See https://github.com/romanzipp/Laravel-Queue-Monitor/pull/68 for more details.

juanparati avatar May 20 '21 12:05 juanparati

I have the same issue with this. Instead of update the existing record, this package creates new row.

How to reproduce:

  1. Set a new jobs (pending jobs)
  2. Run the worker
  3. In the middle of running, interrupt the worker and close it
  4. Restart the worker

You'll see a new record in database. Take a look at my example below

screenshot of duplicate row and job stuck on running

image

As you can see, there are 2 FakeJob with number 21. This package should've updated the FakeJob 21 instead creating a new one.

I think the given solution by @ldmuniz should work

evaleries avatar Oct 17 '21 23:10 evaleries

I've looked into this issue and filed PR #83 which marks previous jobs with the same job id and status "running" as failed.

Could you try out this fix?

Screenshot 2021-10-18 at 07 03 32

@juanparati has submitted a similar PR #68 but i lost track of it since I could not reproduce it on my system.

romanzipp avatar Oct 18 '21 05:10 romanzipp

I'd like to avoid reusing the same monitor instance and overriding the attributes or deleting the previous instance since it I'm currently working on a new database structure which takes stale jobs into consideration

romanzipp avatar Oct 18 '21 05:10 romanzipp

I've tested PR #83, and the problem still occurs. I suggest you mark the job as finished and set failed as true since we verify the job status is finished from the isFinished() method.

evaleries avatar Oct 19 '21 17:10 evaleries