horizon icon indicating copy to clipboard operation
horizon copied to clipboard

laravel horizon jobs stuck in pending

Open khalidgxg opened this issue 2 years ago • 27 comments

Horizon Version

5.15

Laravel Version

10.09.0

PHP Version

8.1

Redis Driver

Predis

Redis Version

2.1.2

Database Driver & Version

No response

Description

Hi all,

I am writing to report a bug related to the job processing in Laravel Horizon. Specifically, I have encountered a situation where a job appears to be stuck in the "pending" status despite having a "completed_at" timestamp, and another job is stuck in the "reserved" status without progressing further.

Here are the details of the problematic jobs:

  • Job with "pending" status but "completed_at" :

    "status": "pending", "completed_at": "1685113859.5327", "reserved_at": "1685113859.5284",

  • Job with "reserved" status :

    "status": "reserved", "completed_at": null, "reserved_at": "1685110558.3953",

In the first case, the job has a "completed_at" timestamp indicating its successful completion, but it remains in the "pending" status. On the other hand, the second job is stuck in the "reserved" status without progressing further.

Could you please investigate this issue and provide guidance on how to resolve it? It seems that the job statuses are not being updated correctly, causing confusion in monitoring and processing.

Thank you for your attention to this matter. I look forward to your response and assistance in resolving this bug.

Best regards, khalid

Steps To Reproduce

  1. Set up a Laravel application with Laravel Horizon installed. Make sure you have the necessary dependencies and configurations in place.
  2. Create a job that exhibits the issue. For example, you can create a custom job class StuckJob that performs some task, such as writing to a log file or making an API request.
  3. Configure your application to use Laravel Horizon as the queue driver. Ensure that the queue connections and supervisors are properly set up.
  4. Push multiple instances of the StuckJob to the queue using the Laravel Horizon job queueing mechanism. You can use the dispatch() function or Horizon-specific methods like Horizon::queue() to push the jobs.
  5. Monitor the Horizon dashboard to observe the job processing. Keep an eye on the status of the jobs you pushed.
  6. Check if any of the jobs get stuck in the "pending" status despite having a "completed_at" timestamp. Note down the relevant job ID, connection, queue, and payload details.
  7. Repeat the process with another job to observe if any jobs get stuck in the "reserved" status without progressing further. Again, note down the relevant job ID, connection, queue, and payload details.
  8. Take note of the Laravel Horizon version you are using in your application.

khalidgxg avatar May 29 '23 10:05 khalidgxg

Hey there,

Can you first please try one of the support channels below? If you can actually identify this as a bug, feel free to open up a new issue with a link to the original one and we'll gladly help you out.

Thanks!

driesvints avatar May 29 '23 11:05 driesvints

ok thanks

khalidgxg avatar May 29 '23 11:05 khalidgxg

Got the same thing happening, latest Laravel & Horizon versions.

Jobs even though processed correctly keep accumulating on the Pending list in the Dashboard, although the queues are empty.

I could not find more accurate reports or solutions specifically to this situation.

@khalidgxg did you learn what was causing it for you?

And @driesvints I assume if one can showcase a repo with Laravel+Horizon that could consistenly show/reproduce the issue happening as reported, it would then qualify for you guys to pursue it as a bug, right? I might try to put something together.

fabriciojs avatar Aug 22 '23 14:08 fabriciojs

@driesvints @fabriciojs https://github.com/laravel/horizon/issues/1034

cccdz avatar Sep 01 '23 07:09 cccdz

Hey all. This should have been fixed already in 4.x, does that not work for you? https://github.com/laravel/telescope/pull/1349

driesvints avatar Sep 01 '23 07:09 driesvints

Telescope that is.

driesvints avatar Sep 01 '23 07:09 driesvints

@driesvints https://github.com/laravel/horizon/issues/1185 This happens when the consumer executes the event faster than the production event, because horizon is implemented through events, which are first dropped into a queue and then triggered, so there may be a situation where the consumer finishes consuming before starting to execute the event.

cccdz avatar Sep 01 '23 07:09 cccdz

image

cccdz avatar Sep 01 '23 07:09 cccdz

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

github-actions[bot] avatar Sep 01 '23 08:09 github-actions[bot]

Thank you. We'd appreciate any help through a PR for this.

driesvints avatar Sep 01 '23 08:09 driesvints

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

cccdz avatar Sep 01 '23 09:09 cccdz

@driesvints is there a way to release these reserved jobs to they can be processed again?

joelvh avatar Sep 22 '23 01:09 joelvh

I don't know sorry.

driesvints avatar Sep 22 '23 07:09 driesvints

@themsaid do you maybe know if it's possible to release these jobs that get stuck as reserved to be processed again? Thanks!

joelvh avatar Sep 25 '23 22:09 joelvh

I'd like to add here as well that we're experiencing the same issue with jobs being stuck in 'pending' even if completed_at is present in horizon dashboard, and the jobs actually completed successfully.

We have not been able to determine the root cause. We dispatch millions of jobs a month, and it affects only approximately 20 per day. Still, it's rather unsettling and we'd love to find a solution.

graemlourens avatar Nov 20 '23 06:11 graemlourens

@pnlinh you are WAY out of date with laravel, horizon & php. Please update to most recent versions and test again. There is no sense in asking for help with such outdated versions.

graemlourens avatar Nov 21 '23 09:11 graemlourens

@pnlinh you are WAY out of date with laravel, horizon & php. Please update to most recent versions and test again. There is no sense in asking for help with such outdated versions.

Thanks for your suggestion but my project cannot upgrade now. I added delay value to jobs, it seems it works.

pnlinh avatar Nov 21 '23 10:11 pnlinh

@pnlinh please try to focus the discussion on supported Laravel/Horizon versions, thanks.

driesvints avatar Nov 27 '23 11:11 driesvints

Even with updated versions I still have this issue.

fwilliamconceicao avatar Jan 22 '24 20:01 fwilliamconceicao

i set 'TELESCOPE_JOB_WATCHER' to false in config, and they all came flooding back into completed.

"Watchers\JobWatcher::class => env('TELESCOPE_JOB_WATCHER', false)"

mentioned https://github.com/laravel/telescope/pull/1349#issuecomment-1645425830

ithuis avatar Jan 23 '24 10:01 ithuis

same issue..

Kladislav avatar Jan 26 '24 15:01 Kladislav

Hey all. Extra messages that you're experiencing this issue aren't really helpful. Instead, please try posting extra findings around the issue or help out with a PR, thanks.

driesvints avatar Feb 09 '24 08:02 driesvints

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

I've been facing this problem for almost 3 years, where the internal solution provided is sleep(3) inside all jobs. https://github.com/laravel/horizon/issues/1034

What does the answer above make sense, since with sleep(3) the events have time to orchestrate themselves normally.

Would this be a possible point of investigation?

lucaspanik avatar Feb 26 '24 16:02 lucaspanik

image image I have a suggestion, is that these two listening methods can be used Lua script, and then in the pushed method of the Lua script to determine whether the key already exists, if the key exists, then on behalf of the completion of the event has been carried out, it is not written to the pending ordered list of collections, just update the hash key, I do not know if this is feasible.

I've been facing this problem for almost 3 years, where the internal solution provided is sleep(3) inside all jobs. #1034

What does the answer above make sense, since with sleep(3) the events have time to orchestrate themselves normally.

Would this be a possible point of investigation?

This worked for me for a couple of months, but since the application scaled up and we had more workload this became a huge headhache.

What I'm doing right now it's migrating everything for serverless services, jobs, and isolated applications with C#.

The only way to stop this behavior is to stop using Horizon for huge workloads.

fwilliamconceicao avatar Feb 26 '24 16:02 fwilliamconceicao

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

It can be temporarily avoided in this way.

cccdz avatar Feb 27 '24 10:02 cccdz

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

It can be temporarily avoided in this way.

This is a good solution tho. But have you tested with a huge workload? My workload's very big and when I started adding 2000 sleep everything started to overlap. I didn't try with 5k, might be it's a good workaround but anyway, it's not a good solution.

fwilliamconceicao avatar Feb 27 '24 12:02 fwilliamconceicao

class RedisJobRepository extends HorizonRedisJobRepository
{
    /**
     * 保留
     *
     * @param $connection
     * @param $queue
     * @param JobPayload $payload
     * @return void
     * @throws RedisException
     */
    public function reserved($connection, $queue, JobPayload $payload): void
    {
        // 循環總時長
        $totalTime = 0;

        // 如果horizon的任務不是pending狀態
        while ('pending' !== redis('horizon')->hget($payload->id(), 'status')) {
            // 如果循環時間大於等於1s
            if ($totalTime >= 1000000) {
                break;
            }

            // sleep 5ms
            usleep(5000);

            $totalTime += 5000;
        }

        parent::reserved($connection, $queue, $payload);
    }
}


$this->app->singleton(JobRepository::class, RedisJobRepository::class);

通过这种方式可以暂时避免。

这是一个很好的解决方案。但你测试过巨大的工作量吗?我的工作量非常大,当我开始添加 2000 睡眠时,一切都开始重叠。我没有尝试使用 5k,这可能是一个很好的解决方法,但无论如何,这不是一个好的解决方案。

I this is within 1 second to detect whether it is a pending state, every 5ms cycle detection, if it is a pending state means that the event has been executed, you can carry out the next operation, I also do the anti-dumbness, if 1 second after the event has not been executed, the task will not care about it, so that he stays in the pending list, but this extreme case is almost zero!

cccdz avatar Feb 27 '24 12:02 cccdz

Hi all. This issue is now one year old. We haven't gotten any action on it any longer and nobody seems to have attempted a PR. There for we'll be closing this one. If anyone still finds a solution to this one we'd be more than willing to accept a PR. Thanks

driesvints avatar Jun 03 '24 12:06 driesvints

Just for the record: the issue is happening for us nearly daily, even with a very low system baseload of 1 million jobs a day, whereby approximately 3-10 jobs still remain in pending, even if completed.

Still happens with most recent laravel and horizon version currently available.

graemlourens avatar Jun 14 '24 07:06 graemlourens

Why isn't this issue still open? Still happening to everyone here and have not seen any clear solution in the comments.

buglinjo avatar Jun 03 '25 20:06 buglinjo