[Bug] Incorrect timer cancellation when workflow worker is down
Describe the bug
We recently discovered an interesting behavior related to cancelling a workflow with an active timer.
- A workflow with a 10-second timer is started.
- Another workflow is triggered that causes the worker to restart (due to a memory leak).
- We attempt to cancel the first workflow (while the worker is still down).
If the worker is restarted at the moment the cancellation request is made for a workflow with a timer, the timer does not get cancelled. In the Temporal UI, the following error appears:
Workflow Task Failed: BadCancelTimerAttributes: invalid history builder state for action: add-timer-canceled-event, TimerID: 5
After the worker restarts, the workflow with the timer remains in the Running state until the WorkflowExecutionTimeout is reached. This behavior seems incorrect.
The expected behavior would be for the cancellation attempt to be retried after the error, or for the entire workflow to fail with an error.
Environment/Versions
Temporal: 1.26.2 PHP: 8.2 Roadrunner: 2024.3.5 Symfony: 6.4 temporal-sdk: 2.13.4
Minimal Reproduction
TimerWorkflow.php
<?php
declare(strict_types=1);
namespace App\Command\MemoryLeak\Workflow;
use Temporal\DataConverter\Type;
use Temporal\Workflow;
use Temporal\Workflow\ReturnType;
use Temporal\Workflow\WorkflowInterface;
use Temporal\Workflow\WorkflowMethod;
#[WorkflowInterface]
final class TimerWorkflow
{
#[WorkflowMethod('timer')]
#[ReturnType(Type::TYPE_STRING)]
public function expire(): \Generator
{
yield Workflow::timer(10);
return yield 'Timer';
}
}
MemoryLeakWorkflow.php
<?php
declare(strict_types=1);
namespace App\Command\MemoryLeak\Workflow;
use Temporal\DataConverter\Type;
use Temporal\Workflow\ReturnType;
use Temporal\Workflow\WorkflowInterface;
use Temporal\Workflow\WorkflowMethod;
#[WorkflowInterface]
final class MemoryLeakWorkflow
{
#[WorkflowMethod('memory-leak')]
#[ReturnType(Type::TYPE_STRING)]
public function create(): \Generator
{
$arr = [];
while (true) {
$arr[] = 'test test test test test';
}
return yield 'memory-leak';
}
}
RunTimerCommand.php
<?php
declare(strict_types=1);
namespace App\Command\MemoryLeak;
use App\Command\MemoryLeak\Workflow\MemoryLeakWorkflow;
use App\Command\MemoryLeak\Workflow\TimerWorkflow;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use Temporal\Client\WorkflowClientInterface;
use Temporal\Client\WorkflowOptions;
use Temporal\Common\IdReusePolicy;
#[AsCommand(name: 'dev:memory-leak-timer')]
final class RunTimerCommand extends Command
{
public function __construct(
private readonly WorkflowClientInterface $workflowClient,
) {
parent::__construct();
}
protected function configure(): void {}
protected function execute(InputInterface $input, OutputInterface $output): int
{
$io = new SymfonyStyle($input, $output);
$workflow1 = $this->workflowClient->newWorkflowStub(
TimerWorkflow::class,
WorkflowOptions::new()
->withWorkflowId('timer')
->withTaskQueue('default')
->withWorkflowIdReusePolicy(IdReusePolicy::AllowDuplicate)
->withWorkflowRunTimeout(20)
->withWorkflowExecutionTimeout(30),
);
$workflow2 = $this->workflowClient->newWorkflowStub(
MemoryLeakWorkflow::class,
WorkflowOptions::new()
->withWorkflowId('memory-leak')
->withTaskQueue('default')
->withWorkflowIdReusePolicy(IdReusePolicy::AllowDuplicate)
->withWorkflowRunTimeout(20)
->withWorkflowExecutionTimeout(30),
);
$this->workflowClient->start($workflow1);
$io->success('Timer started');
$this->workflowClient->start($workflow2);
$io->success('MemoryLeak started');
$runningTimerWorkflow = $this->workflowClient->newUntypedRunningWorkflowStub('timer');
$runningTimerWorkflow->cancel();
$io->success('Timer canceled');
return Command::SUCCESS;
}
}
Also reproduced on RR v2024.1.5 (Jun 20, 2024), SDK v2.10.3 (Jul 5, 2024), Tempora CLI: v1.0.0 (Aug 6, 2024).
Should be fixed in the next RoadRunner release this Thursday
@roxblnfk I just wanted to check if there are any updates regarding Roadrunner. It doesn’t seem to have been updated yet.
Hi. The problem turned out to be a bit deeper, so we postponed the RR release. I hope to come back with good news soon.
@faim87 hi. Did the latest patch solve the problem 100%?