DistributedLock icon indicating copy to clipboard operation
DistributedLock copied to clipboard

Acquired lock handle cannot be disposed

Open nguyencao96 opened this issue 3 years ago • 4 comments

Hello Team, Thanks for really cool package. My application was hanged yesterday and when I attached it to debuggers, it seems that lots of threads were stuck at this state image

The issue is quite easy to replicate by using following test:

[Test]
public void TestParallelAcquire()
{
    ThreadPool.SetMaxThreads(10, 10);
    ThreadPool.SetMinThreads(10, 10);
    var tasks = new List<Task>();
    foreach (var i in Enumerable.Range(0, 100))
    {
        tasks.Add(Task.Run(() =>
        {
            SqlDistributedLockHandle handle = null;
            try
            {
                var locksql = new SqlDistributedLock("MyTestLock", TestingSqlServerDb.DefaultConnectionString, exactName: true);
                handle = locksql.Acquire(TimeSpan.FromSeconds(30), CancellationToken.None);
            }
            finally
            {
                handle?.Dispose();
            }
        }));


    }

    Task.WaitAll(tasks.ToArray());
}

Did I misused the lock, or is there any ways to work around? I've updated the version to 2.3.0 but it did not help. Thanks.

nguyencao96 avatar Dec 23 '21 09:12 nguyencao96

Hi @nguyencao96 thanks for your interest in the library. As I think you suspect, you're hitting a thread starvation issue.

Here are two quick workarounds;

The first and best approach is to use the async locking APIs instead of the synchronous ones. With the async APIs, waiting for the lock doesn't consume a thread and so we won't starve. Here's your test case rewritten to use the async APIs:

        [Test]
        public void TestParallelAcquire()
        {
            ThreadPool.SetMaxThreads(10, 10);
            ThreadPool.SetMinThreads(10, 10);
            var tasks = new List<Task>();
            foreach (var i in Enumerable.Range(0, 100))
            {
                tasks.Add(Task.Run(async () =>
                {
                    var locksql = new SqlDistributedLock(
                               "MyTestLock",
                               TestingSqlServerDb.DefaultConnectionString,
                               exactName: true);
                    await using (await locksql.AcquireAsync(TimeSpan.FromSeconds(30), CancellationToken.None))
                    {
                        Console.WriteLine("I have the lock!");
                    }
                }));
            }

            Task.WaitAll(tasks.ToArray());
        }

Note the two awaits. The await using calls handle.DisposeAsync() which makes sure we release asynchronously. The await locksql.AcquireAsync(...) makes the acquire operation asynchronous.

If you need to keep using the synchronous APIs and you are running into this, the other quick fix is to disable connection keepalive. I'll describe why this helps in a second, but here's the change you'd make to your code:

var locksql = new SqlDistributedLock(
                               "MyTestLock",
                               TestingSqlServerDb.DefaultConnectionString,
                               exactName: true,
                               options: o => o.KeepaliveCadence(Timeout.InfiniteTimeSpan));

Here's what is going on:

  • When keepalive is enabled (which can be important for databases like SQL Azure that like to kill idle lock-holding connections after a while), the lock manages a background Task which pings the database periodically.
  • Because SqlConnection is not thread-safe, we have to tear down this background task before we can safely execute the release query when the lock is disposed. In your case, with the restricted thread pool we end up hanging because all threads are blocked either synchronously waiting to acquire the lock or synchronously waiting for the keepalive task to finish. That leaves no threads to run the implicit continuations necessary to complete the canceled keepalive task.
  • By disabling keepalive, we do away with the keepalive task altogether. Because waiting for the keepalive task to finish is the only operation on the synchronous flow that has to resort to a blocking waint on an async operation (sync-over-async), dropping the keepalive feature makes our code less thread-hungry and therefore not prone to this issue.

If you are using distributed locks with any kind of scale, I'd strongly recommend going fully async since this will avoid any cost to your threadpool!

Please let me know if the above solutions don't work for you.

This issue does make me wonder whether there is a different way to do the keepalive thread that would avoid this for the 99% case that the task does not run for long enough to need to issue a keepalive query. For example, the following Task construction tears down synchronously upon being canceled:

var task = Task.Delay(TimeSpan.FromSeconds(keepaliveCadence), cancellationToken)
	.ContinueWith(t => Task.Run(() => { Console.WriteLine("start keepalive stuff here"); }), cts.Token)
	.Unwrap();

madelson avatar Dec 23 '21 14:12 madelson

Hi @madelson , Thanks a lot for your super clear explanation. Both of the solutions work for us.

We have some logic relying on HttpContext so synchronous APIs is still needed to ensure HttpContext.Current is not null after await . From what I observed, it seems that SQL Server also may kill idle connections after a while (about 10mn). So we will try to keep the inner process small and can be done quickly.

I agreed that if we can tear down the task synchronously, we will not have to worry about this issue anymore. Thanks again :)

nguyencao96 avatar Jan 06 '22 09:01 nguyencao96

Note to self: I think we can achieve synchronous teardown in the common keepalive case by swapping out this code:

            this._monitoringWorkerTask = this._monitoringWorkerTask
                .ContinueWith((_, state) => ((ConnectionMonitor)state).MonitorWorkerLoop(), state: this)
                .Unwrap();

For this code:

 this._monitoringWorkerTask = MonitoringWorkerLoopAsync(this._monitoringWorkerTask);
...

async Task MonitoringWorkerLoopAsync(Task previous)
{
    await previous.ConfigureAwait(false);

    // rest of loop here
}

This way it will run synchronously until it hits the keepalive sleep and should cancel synchronously.

The best way to test would be an invariant check when we're disposing/stopping the connection monitor.

madelson avatar Jan 07 '22 12:01 madelson