sdk-php icon indicating copy to clipboard operation
sdk-php copied to clipboard

[Bug] Incorrect timer cancellation when workflow worker is down

Open faim87 opened this issue 7 months ago • 1 comments

Describe the bug

We recently discovered an interesting behavior related to cancelling a workflow with an active timer.

  1. A workflow with a 10-second timer is started.
  2. Another workflow is triggered that causes the worker to restart (due to a memory leak).
  3. We attempt to cancel the first workflow (while the worker is still down).

If the worker is restarted at the moment the cancellation request is made for a workflow with a timer, the timer does not get cancelled. In the Temporal UI, the following error appears: Workflow Task Failed: BadCancelTimerAttributes: invalid history builder state for action: add-timer-canceled-event, TimerID: 5 After the worker restarts, the workflow with the timer remains in the Running state until the WorkflowExecutionTimeout is reached. This behavior seems incorrect. The expected behavior would be for the cancellation attempt to be retried after the error, or for the entire workflow to fail with an error.

Environment/Versions

Temporal: 1.26.2 PHP: 8.2 Roadrunner: 2024.3.5 Symfony: 6.4 temporal-sdk: 2.13.4

Minimal Reproduction

TimerWorkflow.php

<?php

declare(strict_types=1);

namespace App\Command\MemoryLeak\Workflow;

use Temporal\DataConverter\Type;
use Temporal\Workflow;
use Temporal\Workflow\ReturnType;
use Temporal\Workflow\WorkflowInterface;
use Temporal\Workflow\WorkflowMethod;

#[WorkflowInterface]
final class TimerWorkflow
{
    #[WorkflowMethod('timer')]
    #[ReturnType(Type::TYPE_STRING)]
    public function expire(): \Generator
    {
        yield Workflow::timer(10);

        return yield 'Timer';
    }
}

MemoryLeakWorkflow.php

<?php

declare(strict_types=1);

namespace App\Command\MemoryLeak\Workflow;

use Temporal\DataConverter\Type;
use Temporal\Workflow\ReturnType;
use Temporal\Workflow\WorkflowInterface;
use Temporal\Workflow\WorkflowMethod;

#[WorkflowInterface]
final class MemoryLeakWorkflow
{
    #[WorkflowMethod('memory-leak')]
    #[ReturnType(Type::TYPE_STRING)]
    public function create(): \Generator
    {
        $arr = [];
        while (true) {
            $arr[] = 'test test test test test';
        }

        return yield 'memory-leak';
    }
}

RunTimerCommand.php

<?php

declare(strict_types=1);

namespace App\Command\MemoryLeak;

use App\Command\MemoryLeak\Workflow\MemoryLeakWorkflow;
use App\Command\MemoryLeak\Workflow\TimerWorkflow;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Temporal\Client\WorkflowClientInterface;
use Temporal\Client\WorkflowOptions;
use Temporal\Common\IdReusePolicy;

#[AsCommand(name: 'dev:memory-leak-timer')]
final class RunTimerCommand extends Command
{
    public function __construct(
        private readonly WorkflowClientInterface $workflowClient,
    ) {
        parent::__construct();
    }

    protected function configure(): void {}

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);

        $workflow1 = $this->workflowClient->newWorkflowStub(
            TimerWorkflow::class,
            WorkflowOptions::new()
                ->withWorkflowId('timer')
                ->withTaskQueue('default')
                ->withWorkflowIdReusePolicy(IdReusePolicy::AllowDuplicate)
                ->withWorkflowRunTimeout(20)
                ->withWorkflowExecutionTimeout(30),
        );

        $workflow2 = $this->workflowClient->newWorkflowStub(
            MemoryLeakWorkflow::class,
            WorkflowOptions::new()
                ->withWorkflowId('memory-leak')
                ->withTaskQueue('default')
                ->withWorkflowIdReusePolicy(IdReusePolicy::AllowDuplicate)
                ->withWorkflowRunTimeout(20)
                ->withWorkflowExecutionTimeout(30),
        );

        $this->workflowClient->start($workflow1);
        $io->success('Timer started');

        $this->workflowClient->start($workflow2);
        $io->success('MemoryLeak started');

        $runningTimerWorkflow = $this->workflowClient->newUntypedRunningWorkflowStub('timer');
        $runningTimerWorkflow->cancel();
        $io->success('Timer canceled');

        return Command::SUCCESS;
    }
}

faim87 avatar Jun 05 '25 07:06 faim87

Also reproduced on RR v2024.1.5 (Jun 20, 2024), SDK v2.10.3 (Jul 5, 2024), Tempora CLI: v1.0.0 (Aug 6, 2024).

reproduce.zip

roxblnfk avatar Jun 10 '25 17:06 roxblnfk

Should be fixed in the next RoadRunner release this Thursday

roxblnfk avatar Jul 07 '25 10:07 roxblnfk

@roxblnfk I just wanted to check if there are any updates regarding Roadrunner. It doesn’t seem to have been updated yet.

faim87 avatar Jul 21 '25 08:07 faim87

Hi. The problem turned out to be a bit deeper, so we postponed the RR release. I hope to come back with good news soon.

roxblnfk avatar Jul 21 '25 11:07 roxblnfk

@faim87 hi. Did the latest patch solve the problem 100%?

roxblnfk avatar Aug 05 '25 07:08 roxblnfk