sdk-java icon indicating copy to clipboard operation
sdk-java copied to clipboard

Exceptions thrown from workflow executor code shouldn't be subject to failWorkflowExceptionTypes

Open Spikhalskiy opened this issue 2 years ago • 2 comments

Currently, if our workflow executor / control code throws an exception, it's a subject for WorkflowImplementationOptions#failWorkflowExceptionTypes logic and users can set up their system in a way that such exceptions may fail the workflow execution.

An example: Workflow Executor throws java.lang.IllegalStateException: History is out of order if it received a sticky task and its cached instance is outdated. Our expectation is that this exception will fail WFT and cause a full replay and an execution on another (or the same) worker.

In reality, users may set failWorkflowExceptionTypes into a broad category like Exception and such an internal exception will start to fail workflow executions. Doing this, users expect failWorkflowExceptionTypes to be applied only to the workflow exceptions, but it's also applied to the workflow executor exception.

Proposed solution:

Don't apply WorkflowImplementationOptions#failWorkflowExceptionTypes to the exceptions from the control code. Exceptions thrown by the control code should always fail Workflow Task, not a Workflow Execution. There needs to be one (?at least one?) exception from this new rule - io.temporal.worker.NonDeterministicException. We have users that explicitly want to fail executions if a non-deterministic error happens.

Spikhalskiy avatar Feb 28 '23 20:02 Spikhalskiy

I think we should keep the current logic of applying failWorkflowExceptionTypes to any unexpected error from the control code. But for situations that we know are intermittent we should try to retry.

Another option is to apply failWorkflowExceptionTypes on a second or (Nth) task attempt. This way, intermittent failures would not lead to workflow failures.

mfateev avatar Feb 28 '23 21:02 mfateev

We should be throwing an explicit FailWorkflowTaskException of something like this from the SDK internals if we expect that this exception should always be failing the WFT only and never follow the user-specified criteria.

Spikhalskiy avatar Apr 20 '23 18:04 Spikhalskiy