Orleans client retry logic for "not stable to perform the lookup for grain"
In one our service we do both Orleans Solo/Server and Orleans Clients. We run ASP.NET BackgroundService/ IHostedService to run Actor health probes in background. In first few seconds after Silo startup, actors clients are crashing with similar exceptions
Exception has occurred: CLR/Orleans.Runtime.OrleansException
Exception thrown: 'Orleans.Runtime.OrleansException' in System.Private.CoreLib.dll: 'Current directory at S10.255.255.254:11111:123033869 is not stable to perform the lookup for grainId messageprocessor/0e1fc049508e4757beb484a1028bc96a (it maps to S10.255.255.254:11111:123033821, which is not a valid silo). Retry later.'
at Orleans.Runtime.GrainDirectory.LocalGrainDirectory.<LookupAsync>d__51.MoveNext()
at Orleans.Runtime.GrainDirectory.DhtGrainLocator.<Lookup>d__6.MoveNext()
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Orleans.Runtime.Placement.PlacementService.PlacementWorker.<GetOrPlaceActivationAsync>d__12.MoveNext()
at Orleans.Runtime.Placement.PlacementService.PlacementWorker.AddressWaitingMessages(GrainPlacementWorkItem completedWorkItem)
I understand that Actor client cannot have universal retry logic because not every actor must have idempotent logic. However, that particular exception, says that client cannot "stable to perform the loopup for grainId". That's why IMO, it would be good to add retry logic.
Agreed that it would be nice if we could define what our behavior should be in this case - even just a more specific exception type might be valuable.