TestWorkflowMutableStateImpl - race condition between TimerFired event and CancelTimer command
processCancelTimer throws invalid_argument exception here if timer is null this timer however would have been already removed if this same workflow task receives a TIMER_FIRED event for this same timer:
https://github.com/temporalio/sdk-java/blob/master/temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowMutableStateImpl.java#L1437
I think on cancel command, we should only throw if we check first that if timer is null if it was actually removed in same workflow task.
Issue does not allow workflow in test to complete / make progress.
Full error can look like:
[Workflow Executor taskQueue="flakyservice", namespace="default": 1] WARN io.temporal.internal.worker.WorkflowWorker - Failure while reporting workflow progress to the server. If seen continuously the workflow might be stuck. WorkflowId=flaky, RunId=6d9e3f8a-7a73-4aaf-8cdd-02a3cee750f1, startedEventId=22 io.grpc.StatusRuntimeException: INVALID_ARGUMENT: invalid history builder state for action at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167) at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:6079) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendTaskCompleted$0(WorkflowWorker.java:557) at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:49) at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:40) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendTaskCompleted(WorkflowWorker.java:552) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:409) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:336) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$1(PollTaskExecutor.java:76) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)
Still working on a reliable test for this. @Quinn-With-Two-Ns ping me and can point you to slack there where there is a reproduce that you may need to run a number of times to run into this.