dotnet-sdk [Workflow] GRPC connection to workflow runtime doesn't self-heal when app restarts

cc @philliphoff

runtime 1.13.2 (not tried any other versions)

Expected Behavior

The grpc connection to the workflow runtime will reestablish after the app process (not dapr process) crashes and is restarted.

Actual Behavior

The grpc connection to the workflow runtime does not reestablish after the app process (not dapr process) crashes and is restarted.

Steps to Reproduce the Problem

Pull down my repro here https://github.com/olitomlinson/dapr-workflow-examples

run docker compose -f compose-1-instance-3-schedulers.yml build
run docker compose -f compose-1-instance-3-schedulers.yml up
stop the app container in compose - it will be named something like workflow-app-a-1
start the app container in compose
observe the logs in workflow-app-a-1 and you will see the following error repeating forever :

The gRPC server for Durable Task gRPC worker is unavailable. Will continue retrying.

Release Note

RELEASE NOTE:

May 22 '24 18:05 olitomlinson

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

Sep 12 '24 00:09 cgillum

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

Still an issue in 1.14.4

Sep 17 '24 22:09 olitomlinson

I find this confusing. For the go-sdk I made the client to infinitely retry the worker connection to dapr, and I think we should have that behavior on every SDK, I believe python already has it.

Oct 10 '24 11:10 famarting

@olitomlinson Do you know if this is still an issue with the 1.15 RC?

Jan 28 '25 17:01 WhitWaldo

@olitomlinson Do you know if this is still an issue with the 1.15 RC?

@WhitWaldo yes, just tested on 1.15.0-rc.7, and it still exhibits the same behavior :(

Jan 29 '25 22:01 olitomlinson

@WhitWaldo is this still in progress & if so, can you provide an update?

Jun 17 '25 16:06 cicoyle

@cicoyle I've been investigating options for this locally and while I've made some progress in building out richer debugging tooling and logs to tackling the similar reported issue, I have not yet identified a solid path forward. WIP.

Jun 17 '25 16:06 WhitWaldo

This is still a problem on 1.15.6-rc.5 / dotnet sdk 1.16.0-rc03

The gRPC server for Durable Task gRPC worker is unavailable. Will continue retrying.

Jun 23 '25 18:06 olitomlinson

Still not self-healing in runtime 1.16.0-rc.2 / dotnet sdk 1.16.0-rc05

This could be a real problem in the wild IMO -- What happens if in a kubernetes deployment the app container crashes, and is subsequently restarted (as per the kubelet)? This would not self-heal until something triggers a restart of the dapr container, leading to a period of time where the pod is just not advancing any workflows forward.

Aug 04 '25 22:08 olitomlinson

Still not self-healing in runtime 1.16.0-rc.25/ dotnet sdk 1.16.0-rc15

Aug 29 '25 20:08 olitomlinson