Consider adding some basic preflight checks to sidecar health when workflows are invoked
In what area(s)?
/area runtime
Describe the feature
(using dotnet sdk)
Why trying to start a workflow in an app when the sidecar doesn't have an actor state store configured, the error message returned is generic. Example:
var instanceId = await _client.ScheduleNewWorkflowAsync(
nameof(MyWorkflow),
input: workflowData);
This call works with local Dapr. But when deployed elsewhere, it is easy to forget that it requires a configured state store with the actor flag set on it. The result is that it throws following exception:
Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="did not find address for actor dapr.internal.my-namespace.my-pod-console.workflow/aa000a3dc407400cb035d25ae9edb0c4")
at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.ScheduleNewOrchestrationInstanceAsync(TaskName orchestratorName, Object input, StartOrchestrationOptions options, CancellationToken cancellation)
This exception may occur in multiple situations, thus muddying the underlying problem. It seems trivial to validate that an actor state store is configured and to return a helpful message like "The workflow 'MyWorkflow' is registered, but no actor compatible state store was found. Please configure one and retry the operation."
Release Note
RELEASE NOTE:
Add more helpful error messages for workflow misconfiguration(s).
This sounds drastic, but I would be in favour of crashing the entire sidecar init process if it detected that you were trying to use Actors / Workflows, but had not yet configured an Actor-capable state store.
This might be a great opportunity to expand the Health API to facilitate the SDKs to do their own checks of which runtime services are live and ready instead of having a single endpoint that presumably covers all scenarios.
I happened to find a project that takes this approach and provides more robust per-service health checks.
The app startup / SDK startup, could call the v1.0/metadata endpoint, which would indicate to the SDK if an actor-compatible state store is registered.
If not registered, the SDK can warn/error/crash.
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.
Bump.
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.
Not sure what we do with this, @WhitWaldo
I'm not the authority here, but I still think exiting dapr gracefully would be the way to go.
I don't think health checks are the solution here, as there being no actor state store component loaded is not a transient issue that the sidecar can recover from given time. Transient issues are fine for health checks, but not a misconfiguration scenario like this.
@olitomlinson
I'm not the authority here, but I still think exiting dapr gracefully would be the way to go.
I don't think health checks are the solution here, as there being no actor state store component loaded is not a transient issue that the sidecar can recover from given time. Transient issues are fine for health checks, but not a misconfiguration scenario like this.
I beg to differ -- the actor state store is loaded asynchronously with respect to the app, so it may well be a transient issue. It's perfectly possible to write a dotnet app that tries to start a workflow before the sidecar has loaded the state store.
The original problem was :
Why trying to start a workflow in an app when the sidecar doesn't have an actor state store configured
What I interpreted this as is when there is no state store component yaml loaded, or there is a state store component yaml loaded, but it is not configured with the actorStateStore : true metadata.
Is that not what the cause of this issue was?
It's perfectly possible to write a dotnet app that tries to start a workflow before the sidecar has loaded the state store.
That is a problem today, yes, which is being addressed by the work that @JoshVanL is doing to make sure that any requests to the Workflow runtime are blocked until the workflow runtime is ready.
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.