akka.net
akka.net copied to clipboard
AkkaSystem unclean termination
Version Information akka.net 1.5.13
Describe the bug
akka shutdown/termination on fatal situations does not trigger final stop signals and callbacks.
This includes AkkaSystem.WhenTerminated
and System.RegisterOnTermination()
This leads to AkkaSystem.WhenTerminated
never completes (LivenessHealthCheck: Healthy)
and no ApplicationStop can be triggered over System.RegisterOnTermination(StopApplication)
.
(Downed node with ReadinessHealthCheck: Unhealthy)
To Reproduce Terminate and/or CoordinatedShutdown a system in OOM situation.
Expected behavior
The AkkaSystem.RegisterOnTermination
should trigger even after an unsuccessful CoordinatedShutdown
or AkkaSystem.Terminate()
Actual behavior
The CoordinatedShutdown
and/or AkkaSystem.Terminate()
are throwing and no AkkaSystem.RegisterOnTermination
are executed
and AkkaSystem.WhenTerminated
does not complete.
Environment ubuntu-jammy docker desktop and k8n
I can't really reproduce the bug, maybe there are other things that causes the actor system to fail?
Terminate and/or CoordinatedShutdown a system in OOM situation.
So I missed this - but this is largely an unhandleable situation and CoordinatedShutdown
won't run correctly because processes are aborted when this occurs. Catastrophic runtime failures can't be handled gracefully through the normal pathways we use to handle graceful terminations. The solve here is to fix the OOM.
See https://learn.microsoft.com/en-us/dotnet/api/system.outofmemoryexception?view=net-7.0 for a fuller explanation on what you can do to log this type of error (Environment.FailFast
), but there are no tools to handle it once it gets going.
Its not that OOM exception should be handled explicitly,
but system.WhenTerminated
should be completed even after an exception in system.Terminate()
itself.
The callbacks registered in System.RegisterOnTermination(StopApplication)
could/should be executed even after an exception in CoordinatedShutdown
If not system.Terminate()
and/or CoordinatedShutdown
will break the system state and system.WhenTerminated
never completes and no System.RegisterOnTermination(StopApplication)
callbacks get executed.
My end result was that my LiveHealthCheck on system.WhenTerminated
was successful
but the ReadyHealthCheck on the akka cluster was in a failure state. (Kubernetes)
In my opinion system.Terminate()
should have the same behavior as Dispose()
that even when it internaly throws the end result of the instance is disposed
but system.WhenTerminated should be completed even after an exception in system.Terminate() itself. The callbacks registered in System.RegisterOnTermination(StopApplication) could/should be executed even after an exception in CoordinatedShutdown
Ok, that is fixable. We can do that.
@Zetanova can you look at #6967 and check if I missed anything, I could not reproduce your error.
Any updates on this @Zetanova ?