orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Orleans client retry logic for "not stable to perform the lookup for grain"

Open oleksandr-bilyk opened this issue 1 month ago • 1 comments

In one our service we do both Orleans Solo/Server and Orleans Clients. We run ASP.NET BackgroundService/ IHostedService to run Actor health probes in background. In first few seconds after Silo startup, actors clients are crashing with similar exceptions

Exception has occurred: CLR/Orleans.Runtime.OrleansException
Exception thrown: 'Orleans.Runtime.OrleansException' in System.Private.CoreLib.dll: 'Current directory at S10.255.255.254:11111:123033869 is not stable to perform the lookup for grainId messageprocessor/0e1fc049508e4757beb484a1028bc96a (it maps to S10.255.255.254:11111:123033821, which is not a valid silo). Retry later.'
   at Orleans.Runtime.GrainDirectory.LocalGrainDirectory.<LookupAsync>d__51.MoveNext()
   at Orleans.Runtime.GrainDirectory.DhtGrainLocator.<Lookup>d__6.MoveNext()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Orleans.Runtime.Placement.PlacementService.PlacementWorker.<GetOrPlaceActivationAsync>d__12.MoveNext()
   at Orleans.Runtime.Placement.PlacementService.PlacementWorker.AddressWaitingMessages(GrainPlacementWorkItem completedWorkItem)

I understand that Actor client cannot have universal retry logic because not every actor must have idempotent logic. However, that particular exception, says that client cannot "stable to perform the loopup for grainId". That's why IMO, it would be good to add retry logic.

oleksandr-bilyk avatar Nov 26 '25 00:11 oleksandr-bilyk

Agreed that it would be nice if we could define what our behavior should be in this case - even just a more specific exception type might be valuable.

insylogo avatar Dec 10 '25 16:12 insylogo