orleans icon indicating copy to clipboard operation
orleans copied to clipboard

[Orleans 7] DeactivateOnIdle may cause deadlock when grain has timer

Open zh6335901 opened this issue 2 years ago • 1 comments

Grain code:

public interface ITestGrain : IGrainWithGuidKey
{
    Task TimerTick();

    Task Connect();

    Task Disconnect();
}

public class TestGrain : Grain, ITestGrain
{
    private readonly ILogger _logger;
    private IDisposable? _timer;

    public TestGrain(ILogger<TestGrain> logger)
    {
        _logger = logger;
    }

    public Task Connect()
    {
        _timer = RegisterTimer(
            _ => this.AsReference<ITestGrain>().TimerTick(),
            null,
            TimeSpan.FromSeconds(1),
            TimeSpan.FromSeconds(1)
        );
        
        return Task.CompletedTask;
    }

    public async Task Disconnect()
    {
        await Task.Delay(1000);
        DeactivateOnIdle();
    }

    public async Task TimerTick()
    {
        await Task.Delay(100);
        _logger.LogInformation("Timer ticked!");
    }
}

Client code:

[HttpGet]
public async Task Test()
{
    var grain = _clusterClient.GetGrain<ITestGrain>(Guid.Empty);

    await grain.Connect();
    await grain.Disconnect();
} 

When i executed client code a few times, It most likely appear deadlock, Then TimeoutException thrown in DeactivateOnIdle method:

  Exception when trying to deactivate [Activation: S127.0.0.1:11111:35964260/test/00000000000000000000000000000000@ae30efc925a74d97937d9532889c1d7a#GrainType=OrleansTimerTest.Grains.TestGrain,OrleansTimerTest Placement=RandomPlacement State=Deactivating]
  System.Threading.Tasks.TaskCanceledException: A task was canceled.
     at Orleans.Internal.OrleansTaskExtentions.MakeCancellable(Task task, CancellationToken cancellationToken) in /_/src/Orleans.Core/Async/TaskExtensions.cs:line 188
     at Orleans.Runtime.ActivationData.FinishDeactivating(CancellationToken cancellationToken) in /_/src/Orleans.Runtime/Catalog/ActivationData.cs:line 1380

After DeactivateOnIdle failed, this grain cannot be activated again, every request to this grain would return error:

OrleansMessageRejectionException: Forwarding failed: tried to forward message Request [S127.0.0.1:11111:35964260 sys.client/hosted-127.0.0.1:11111@35964260]->[S127.0.0.1:11111:35964260 test/00000000000000000000000000000000] OrleansTimerTest.Grains.ITestGrainOrleansTimerTest.Grains.ITestGrain.Connect() #65[ForwardCount=2] for 2 times after "Failed to register activation in grain directory." to invalid activation. Rejecting now. Orleans.Serialization.Invocation.ResponseCompletionSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(short token) in ResponseCompletionSource.cs, line 107

And i look up the source code of DeactivateOnIdle, I found it contains some code that is waiting for timer's asyncCallback task completion, I think it is root cause of this problem, The code is at below:

private Task WaitForAllTimersToFinish(CancellationToken cancellationToken)
{
    lock (this)
    {
        if (Timers is null)
        {
            return Task.CompletedTask;
        }

        var tasks = new List<Task>();
        var timerCopy = Timers.ToList(); // need to copy since OnTimerDisposed will change the timers set.
        foreach (var timer in timerCopy)
        {
            // first call dispose, then wait to finish.
            Utils.SafeExecute(timer.Dispose, _shared.Logger, "timer.Dispose has thrown");
            tasks.Add(timer.GetCurrentlyExecutingTickTask());
        }

        return Task.WhenAll(tasks).WithCancellation(cancellationToken);
    }
}

zh6335901 avatar Feb 21 '23 06:02 zh6335901

I came across a similar issue, but it was related to an exception happening in the timer code, but there is no exception handling around WaitForAllTimersToFinish. This lack of exception handling leaves the grain a in inconsistent state where it can never be activated again.

BMagerMT avatar Jan 23 '24 19:01 BMagerMT

Fixed by #8951

ReubenBond avatar Jul 11 '24 21:07 ReubenBond