akka.net icon indicating copy to clipboard operation
akka.net copied to clipboard

DistributedPubSub SendOneMessageToEachGroup issue after any subscriber becomes unrecheable

Open joonhwan opened this issue 6 years ago • 1 comments

  • Akka.Net Version: 1.3.11
  • Windows 7 / .NET Framework 4.6.x
  • Detailed Description follows

I pushed test code onto https://github.com/joonhwan/akka.net-distribute-pubsub-test

The solution contains following console programs.

  • Seed

  • JobRequester (Console Pseudo Job Requester. any alphabet string will be sent to all Job Handler, any numeric string will be sent to a Single Job Handler)

  • JobHandler : Subscribed like followings. one for 1-to-n, the other for 1-to-1 way using sendOneMessageToEachGroup property of Publish message.

    Mediator.Tell(new Subscribe("echo", Self, "handler")); // for 1-to-1 Mediator.Tell(new Subscribe("echo", Self)); // for 1-to-n

While I run

  • Seed
  • JobRequester
  • JobHandler (more than 2 instances... port number will automatically allocated)

everything seems to ok. If i enter alphabet string , that will be sent in a broadcast way. All JobHandler got that message, and If enter numerical string, that will be sent in a round-robin way. Only one of JobHandler got it. No message dropped.

Issue 1

but if I close one of JobHandler console, any 1-to-1 message that was supposed to be sent to that closed JobHandler will be dropped.

Issue 2

When I closed all of JobHandler and then re-run another JobHandler to join the cluster, that JobHandler cannot receive any message from JobRequester.

Any Hint or Guide will be appreciated.

joonhwan avatar Feb 07 '19 07:02 joonhwan

I run into the same problem on Akka 1.3.18, using DData for state sync.

We create a short-lived actor subscribing to a topic with a unique guid as the group name. Another actor on a different node publishes to this topic with the flag sendOneMessageToEachGroup = true. First time this actor is created it works fine and it receives messages. After a while it stops itself. When another instance is created later it subscribes with a new guid, gets the subscription ack, but receives no messages. When a message is published to this topic a NRE is thrown in the DistributedPubSubMediator on the publishing node. Stacktrace:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Akka.Routing.Router.Send(Routee routee, Object message, IActorRef sender)
   at Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubMediator.PublishToEachGroup(String path, Object message)
   at lambda_method(Closure , Object , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Object[] )
   at Akka.Tools.MatchHandler.PartialHandlerArgumentsCapture`16.Handle(T value)
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)

When I tried to repro this in a unit test I did not manage. I did not use clustering or remoting, it was pub/sub in a single ActorSystem, so I think this only repros when actors are from different nodes.

Thinking about it this may be a BIG deal. If we ever stop a subscribed actor pub/sub becomes broken? We definitely subscribe from cluster singletons, so when the oldest node goes down and they shift, that could lead to these errors? I never noticed this in particular, but in these cases the group stays the same (not a random guid as above), so maybe it's just to do with the last actor of a group going missing?

I also noticed there was a fix for pub/sub actor termination handling in v1.4, was that perhaps related and if so, can that be backported?

ondrejpialek avatar Jul 01 '20 11:07 ondrejpialek

The problem was already resolved.

Arkatufus avatar Oct 17 '22 15:10 Arkatufus

@Arkatufus I don't think this issue has been fully solved. We still get the following exception using sendOneMessageToEachGroup = true in Akka.NET v1.4.48:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Akka.Routing.Router.Send(Routee routee, Object message, IActorRef sender)
   at Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubMediator.PublishToEachGroup(String path, Publish publish)
   at Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubMediator.<.ctor>b__15_2(Publish publish)
   at lambda_method43(Closure , Object , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Action`1 , Object[] )
   at Akka.Tools.MatchHandler.PartialHandlerArgumentsCapture`16.Handle(T value)
   at Akka.Actor.ReceiveActor.OnReceive(Object message)
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)

I'll try to provide more details, and perhaps a new issue for it.

lucavice avatar Feb 23 '23 16:02 lucavice