js-sdk icon indicating copy to clipboard operation
js-sdk copied to clipboard

[Workflows] Error propagation from Activity to Workflow does not work

Open olitomlinson opened this issue 5 months ago • 4 comments

Expected Behavior

When an Activity throws an Error I would expect the error to be successfully propagated back to the Workflow, where it can be gracefully handled using normal try catch (or not, and allowed to transistion the workflow into a runtime FAILED status)

Actual Behavior

  • propagating of error fails
  • Workflow remains in a RUNNING state, and is now stuck.

Steps to Reproduce the Problem

Using the Dapr QuickStarts, modify one of the Activities within orderProcessingWorkflow.ts and throw an error e.g throw new Error("Something went wrong!");

Build the app, and start a new workflow.

Observe the logs :

== APP - order-processor == Verifying inventory for b6c25c32-1d7c-4e5d-8178-852fdfe324c5 of 1 car
== APP - order-processor == Error: Something went wrong!
== APP - order-processor ==     at verifyInventoryActivity (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/dist/orderProcessingWorkflow.js:17:11)
== APP - order-processor ==     at activityWrapper (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/node_modules/@dapr/dapr/workflow/runtime/WorkflowRuntime.js:85:20)
== APP - order-processor ==     at ActivityExecutor.execute (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/node_modules/@dapr/durabletask-js/worker/activity-executor.js:21:30)
== APP - order-processor ==     at TaskHubGrpcWorker._executeActivity (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/node_modules/@dapr/durabletask-js/worker/task-hub-grpc-worker.js:237:43)
== APP - order-processor ==     at ClientReadableStreamImpl.<anonymous> (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/node_modules/@dapr/durabletask-js/worker/task-hub-grpc-worker.js:132:26)
== APP - order-processor ==     at ClientReadableStreamImpl.emit (node:events:512:28)
== APP - order-processor ==     at addChunk (node:internal/streams/readable:324:12)
== APP - order-processor ==     at readableAddChunk (node:internal/streams/readable:297:9)
== APP - order-processor ==     at Readable.push (node:internal/streams/readable:234:10)
== APP - order-processor ==     at Object.onReceiveMessage (/Users/olivertomlinson/Desktop/dev/quickstarts/workflows/javascript/sdk/node_modules/@grpc/grpc-js/build/src/client.js:349:24)
== APP - order-processor == An error occurred while trying to execute activity 'verifyInventoryActivity': Something went wrong!
== APP - order-processor == Failed to deliver activity response for 'verifyInventoryActivity#2' of orchestration ID 'b6c25c32-1d7c-4e5d-8178-852fdfe324c5' to sidecar: 2 UNKNOWN: unknown instance ID/task ID combo: /2

fetch the status of the workflow and notice it is still in running state

{
    "instanceID": "b6c25c32-1d7c-4e5d-8178-852fdfe324c5",
    "workflowName": "orderProcessingWorkflow",
    "createdAt": "2025-07-04T08:57:26.547359Z",
    "lastUpdatedAt": "2025-07-04T08:57:26.576156Z",
    "runtimeStatus": "RUNNING",
    "properties": {
        "dapr.workflow.input": "{\"itemName\":\"car\",\"totalCost\":5000,\"quantity\":1}"
    }
}

note : If you stop and restart the App, you will see the same error.

olitomlinson avatar Jul 04 '25 09:07 olitomlinson

@cicoyle what's your reasoning for demoting this to P1?

AFAIK a stuck workflow that can only be abandoned/terminated, with no mitigation/work around, should be a P0.

olitomlinson avatar Aug 04 '25 23:08 olitomlinson

She changed it with my go-ahead. I've changed it back to a P0 on this repo, but left the P1 on the release board itself as it isn't release blocking, but as you indicated, an important blocker on this SDK itself.

I'm trying to get some of the infrastructure fixed in this SDK and then I'll see if I can't get to this as part of this release as well.

WhitWaldo avatar Aug 05 '25 15:08 WhitWaldo

@JoshVanL I'm showing that in both the .NET and JS SDKs, if there's an error, it's swallowed in the SDKs and the failureDetails property is populated on the ActivityResponse. Is there anything more necessary from the SDK to indicate a failure state (e.g., must it be formatted in some particular way) to notify the runtime that the activity has entered a failed state and is not actually running still?

WhitWaldo avatar Sep 25 '25 23:09 WhitWaldo

@WhitWaldo upon an error the SDK must report that the workflow is in a FAILED state. This is a responsibility of the SDK implementation to bubble the error to the worker and report to daprd of the failure. Here is the implementation in go:

	if err != nil {
		resp.Actions = []*protos.OrchestratorAction{
			{
				Id: -1,
				OrchestratorActionType: &protos.OrchestratorAction_CompleteOrchestration{
					CompleteOrchestration: &protos.CompleteOrchestrationAction{
						OrchestrationStatus: protos.OrchestrationStatus_ORCHESTRATION_STATUS_FAILED,
						Result:              wrapperspb.String("An internal error occured while executing the orchestration."),
						FailureDetails: &protos.TaskFailureDetails{
							ErrorType:    fmt.Sprintf("%T", err),
							ErrorMessage: err.Error(),
						},
					},
				},
			},
		}
	} else {

Activities do not inherently cause the entire workflow to fail. A workflow should fail if the workflow orchestrator func returns an error. An activity failure may be tried and catched in each SDK.

This is the Output when running in Go:

	f.workflow.Registry().AddOrchestratorN("foo", func(ctx *task.OrchestrationContext) (any, error) {
		return nil, errors.New("intentional failure")
	})

	client := f.workflow.BackendClient(t, ctx)

	id, err := client.ScheduleNewOrchestration(ctx, "foo")
	require.NoError(t, err)
	meta, err := client.WaitForOrchestrationCompletion(ctx, id)
	require.NoError(t, err)
	assert.Equal(t, api.RUNTIME_STATUS_FAILED.String(), meta.RuntimeStatus.String())
	fmt.Printf(">>%s\n", meta)
>>instanceId:"17b44b54-db35-4548-a0a6-5e4d45adc6e2" name:"foo" runtimeStatus:ORCHESTRATION_STATUS_FAILED createdAt:{seconds:1759228916 nanos:170976874} lastUpdatedAt:{seconds:1759228919 nanos:182428853} failureDetails:{errorType:"*errors.errorString" errorMessage:"intentional failure"}

JoshVanL avatar Sep 30 '25 10:09 JoshVanL