laravel-workflow icon indicating copy to clipboard operation
laravel-workflow copied to clipboard

Child workflow with inner `all` causes activities to sometimes execute multiple times across queue workers

Open CadenP opened this issue 1 year ago • 10 comments

I have a child workflow being executed. Within that child workflow is ActivityStub::all. Activities are sometimes run more than once if multiple queue workers are open.

The following is a somewhat minimal reproduction.

I executed it in an artisan command using:

$workflow = WorkflowStub::make(ExampleWorkflow::class);
$workflow->start('123');
<?php

namespace App\Console\Commands\WorkflowTest;

use Workflow\Activity;
use Workflow\ActivityStub;
use Workflow\ChildWorkflowStub;
use Workflow\Workflow;

class ExampleWorkflow extends Workflow
{
    public function execute(string $someId)
    {
        yield ChildWorkflowStub::make(InnerWorkflow::class, $someId);

        return yield ActivityStub::make(FinishWorkflowActivity::class, $someId);
    }
}

class InnerWorkflow extends Workflow
{
    public function execute(string $someId)
    {
        yield ActivityStub::all([
            ActivityStub::make(InnerActivity1::class, $someId, 'one'),
            ActivityStub::make(InnerActivity1::class, $someId, 'one-two'),
        ]);
        yield ActivityStub::make(InnerActivity2::class, $someId, 'two');

        return yield ActivityStub::make(InnerActivity3::class, $someId, 'three');
    }
}

class FinishWorkflowActivity extends Activity
{
    public function execute(string $someId)
    {
        dump("Finishing for {$someId}...");
        sleep(3);
        dump("Finished for {$someId}.");
    }
}

class InnerActivity1 extends Activity
{
    public function execute(string $someId, string $innerId)
    {
        dump("Running inner activity 1 for {$innerId} on {$someId}...");
        sleep(3);
        dump("Ran inner activity 1 for {$innerId} on {$someId}.");
    }
}

class InnerActivity2 extends Activity
{
    public function execute(string $someId, string $innerId)
    {
        dump("Running inner activity 2 for {$innerId} on {$someId}...");
        sleep(3);
        dump("Ran inner activity 2 for {$innerId} on {$someId}.");
    }
}

class InnerActivity3 extends Activity
{
    public function execute(string $someId, string $innerId)
    {
        dump("Running inner activity 3 for {$innerId} on {$someId}...");
        sleep(3);
        dump("Ran inner activity 3 for {$innerId} on {$someId}.");
    }
}

The following is the output I received (only including the dump output and surrounding activity). Note how there are duplicate outputs in both workers. (There were no failures in the complete output.)

Worker 1:

2024-06-27 20:58:59 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... RUNNING
"Running inner activity 1 for one-two on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:48
"Ran inner activity 1 for one-two on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:50
2024-06-27 20:59:02 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... 3s DONE
2024-06-27 20:59:02 App\Console\Commands\WorkflowTest\InnerActivity2 ....................... RUNNING
"Running inner activity 2 for two on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:58
"Ran inner activity 2 for two on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:60
2024-06-27 20:59:05 App\Console\Commands\WorkflowTest\InnerActivity2 ....................... 3s DONE
2024-06-27 20:59:05 App\Console\Commands\WorkflowTest\InnerActivity3 ....................... RUNNING
"Running inner activity 3 for three on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:68
"Ran inner activity 3 for three on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:70
2024-06-27 20:59:08 App\Console\Commands\WorkflowTest\InnerActivity3 ....................... 3s DONE
2024-06-27 20:59:09 App\Console\Commands\WorkflowTest\FinishWorkflowActivity ............... RUNNING
"Finishing for 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:38
"Finished for 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:40
2024-06-27 20:59:12 App\Console\Commands\WorkflowTest\FinishWorkflowActivity ............... 3s DONE

Worker 2:

2024-06-27 20:58:58 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... RUNNING
"Running inner activity 1 for one on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:48
"Ran inner activity 1 for one on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:50
2024-06-27 20:59:01 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... 3s DONE
2024-06-27 20:59:01 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... RUNNING
"Running inner activity 1 for one-two on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:48
"Ran inner activity 1 for one-two on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:50
2024-06-27 20:59:04 App\Console\Commands\WorkflowTest\InnerActivity1 ....................... 3s DONE
2024-06-27 20:59:04 App\Console\Commands\WorkflowTest\InnerActivity2 ....................... RUNNING
"Running inner activity 2 for two on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:58
"Ran inner activity 2 for two on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:60
2024-06-27 20:59:07 App\Console\Commands\WorkflowTest\InnerActivity2 ....................... 3s DONE
2024-06-27 20:59:07 App\Console\Commands\WorkflowTest\InnerActivity3 ....................... RUNNING
"Running inner activity 3 for three on 123..." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:68
"Ran inner activity 3 for three on 123." // app/Console/Commands/WorkflowTest/ExampleWorkflow.php:70
2024-06-27 20:59:10 App\Console\Commands\WorkflowTest\InnerActivity3 ....................... 3s DONE

If it matters, I am using laravel/framework:10.48.2 with laravel-workflow/laravel-workflow:1.0.26.

CadenP avatar Jun 27 '24 21:06 CadenP

Hi @CadenP, are you able to find a workaround for this issue? I’m facing the same issue here

yoonwaiyan avatar Jul 31 '24 09:07 yoonwaiyan

Hello @yoonwaiyan, no, I didn't find a workaround. But I also didn't really look for one, as I was only assessing the package. We have a workaround in place with plain Laravel jobs to solve our needs, and I don't want to trade one workaround for another in this case.

CadenP avatar Jul 31 '24 13:07 CadenP

I agree that duplicate work is bad/wasteful but you must be willing to accept it because of the constraints around potentially failed activities. If your activities are properly idempotent, this isn't an issue.

rmcdaniel avatar Jul 31 '24 13:07 rmcdaniel

@rmcdaniel A couple of questions here:

  1. In what way an activity would be considered a failed activity?
  2. The activities that I have are idempotent, and another issue from this is that having an activity that run after .all finished doesn't seem to make much sense. This is a modified source code that I have:
class RecordWorkflow extends Workflow
{
    public function execute($configs)
    {
        $processes = array_map(
            fn ($config) => ActivityStub::make(ProcessRecordActivity::class, $config),
            $configs
        );

        $records = yield ActivityStub::all($processes);

        RecordProcessor::buildReport($records);

        return $records;
    }
}

Or do I have to make RecordProcessor::buildReport to be an activity so that it's queued to run after .all complete?

yoonwaiyan avatar Jul 31 '24 13:07 yoonwaiyan

A failed activity is any failure that happens before the result is successfully stored. So this could mean that some remote API is down or it could be mean that the remote API executed properly but there was an error with the response.

RecordProcessor::buildReport should execute after the return value for $records is set. It does look like you would need another activity for RecordProcessor::buildReport.

rmcdaniel avatar Jul 31 '24 14:07 rmcdaniel

@rmcdaniel and what is the constraint of possible failure of a job that makes Workflow queue multiple copies of the same jobs (activity)? Can you explain in detail how is the multiple jobs being queued? We have some activities that may take longer time to complete or more susceptible to failure such as database calls, but we can't seem to see any consistency or clear indication of what jobs are being queued multiple times.

If you mean that the jobs have failed halfway and that's the reason it's being rerun, it's not the case for the project I'm working on. No failure and it went to the last step which is the RecordProcessor::buildReport.

yoonwaiyan avatar Aug 01 '24 05:08 yoonwaiyan

I agree that duplicate work is bad/wasteful but you must be willing to accept it because of the constraints around potentially failed activities. If your activities are properly idempotent, this isn't an issue.

Thanks for the answers, @rmcdaniel! :pray:

I understand that the need to handle failures implies an activity/workflow may be scheduled more than once and should be idempotent. The part I'm unclear about is this: if there are no failures, should we see the same activity/workflow being scheduled multiple times or is it guaranteed to be scheduled only once?

carloslima avatar Aug 01 '24 06:08 carloslima

The only guarantee is that the entire workflow will be executed at least once. This is known as "at least once" semantics and it differs from "at most once" and "exactly once". This "at least once" semantics means activities may be executed multiple times.

rmcdaniel avatar Aug 01 '24 08:08 rmcdaniel

@rmcdaniel I'm sorry for the confusion. We're facing an error and were too quick to assume it was related to this issue.

I have created a separate issue for the problem we're seeing: #187

Thanks for the assistance :pray:

carloslima avatar Aug 02 '24 08:08 carloslima

Thanks for your explanation, @rmcdaniel. I was misunderstanding idempotency and its necessity, so I appreciate you helping clear that up. This ticket, as written, could be closed.

That said, looking further at the logs I posted, it seems as though there's a different issue I didn't see before: the same activity is being executed twice, simultaneously. Would this cause issues with idempotency, since the first run doesn't have a chance to indicate that it completed?

I've re-written the log output to hopefully make this clearer:

STRT: "Start" QWx: "Queue worker #x" ACTx: "Activity #x"

20:58:58 STRT QW1 ACT1A
20:58:59 STRT QW2 ACT1B
20:59:01 DONE QW1 ACT1A
20:59:01 STRT QW1 ACT1B // still running on QW2
20:59:02 DONE QW2 ACT1B
20:59:02 STRT QW2 ACT2
20:59:04 DONE QW1 ACT1B
20:59:04 STRT QW1 ACT2 // still running on QW2
20:59:05 DONE QW2 ACT2
20:59:05 STRT QW2 ACT3
20:59:07 DONE QW1 ACT2
20:59:07 STRT QW1 ACT3 // still running on QW2
20:59:08 DONE QW2 ACT3
20:59:09 STRT QW2 finishing all // ACT3 on QW1 is still running, but it finished on QW2
20:59:10 DONE QW1 ACT3
20:59:12 DONE QW2 finishing all

CadenP avatar Aug 02 '24 21:08 CadenP