akka.net icon indicating copy to clipboard operation
akka.net copied to clipboard

AkkaSystem unclean termination

Open Zetanova opened this issue 1 year ago • 8 comments

Version Information akka.net 1.5.13

Describe the bug akka shutdown/termination on fatal situations does not trigger final stop signals and callbacks. This includes AkkaSystem.WhenTerminated and System.RegisterOnTermination()

This leads to AkkaSystem.WhenTerminated never completes (LivenessHealthCheck: Healthy) and no ApplicationStop can be triggered over System.RegisterOnTermination(StopApplication). (Downed node with ReadinessHealthCheck: Unhealthy)

To Reproduce Terminate and/or CoordinatedShutdown a system in OOM situation.

Expected behavior The AkkaSystem.RegisterOnTermination should trigger even after an unsuccessful CoordinatedShutdown or AkkaSystem.Terminate()

Actual behavior The CoordinatedShutdown and/or AkkaSystem.Terminate() are throwing and no AkkaSystem.RegisterOnTermination are executed and AkkaSystem.WhenTerminated does not complete.

Environment ubuntu-jammy docker desktop and k8n

Zetanova avatar Oct 05 '23 12:10 Zetanova

I can't really reproduce the bug, maybe there are other things that causes the actor system to fail?

Arkatufus avatar Oct 23 '23 19:10 Arkatufus

Terminate and/or CoordinatedShutdown a system in OOM situation.

So I missed this - but this is largely an unhandleable situation and CoordinatedShutdown won't run correctly because processes are aborted when this occurs. Catastrophic runtime failures can't be handled gracefully through the normal pathways we use to handle graceful terminations. The solve here is to fix the OOM.

Aaronontheweb avatar Oct 24 '23 14:10 Aaronontheweb

See https://learn.microsoft.com/en-us/dotnet/api/system.outofmemoryexception?view=net-7.0 for a fuller explanation on what you can do to log this type of error (Environment.FailFast), but there are no tools to handle it once it gets going.

Aaronontheweb avatar Oct 24 '23 14:10 Aaronontheweb

Its not that OOM exception should be handled explicitly, but system.WhenTerminated should be completed even after an exception in system.Terminate() itself. The callbacks registered in System.RegisterOnTermination(StopApplication) could/should be executed even after an exception in CoordinatedShutdown

If not system.Terminate() and/or CoordinatedShutdown will break the system state and system.WhenTerminated never completes and no System.RegisterOnTermination(StopApplication) callbacks get executed.

My end result was that my LiveHealthCheck on system.WhenTerminated was successful but the ReadyHealthCheck on the akka cluster was in a failure state. (Kubernetes)

Zetanova avatar Oct 24 '23 19:10 Zetanova

In my opinion system.Terminate() should have the same behavior as Dispose() that even when it internaly throws the end result of the instance is disposed

Zetanova avatar Oct 24 '23 19:10 Zetanova

but system.WhenTerminated should be completed even after an exception in system.Terminate() itself. The callbacks registered in System.RegisterOnTermination(StopApplication) could/should be executed even after an exception in CoordinatedShutdown

Ok, that is fixable. We can do that.

Aaronontheweb avatar Oct 24 '23 20:10 Aaronontheweb

@Zetanova can you look at #6967 and check if I missed anything, I could not reproduce your error.

Arkatufus avatar Oct 24 '23 20:10 Arkatufus

Any updates on this @Zetanova ?

Aaronontheweb avatar Dec 29 '23 15:12 Aaronontheweb