AsyncEx Another deadlock - or at least a very slow lock

I added this to #57 but thought that since it's closed, the repro for an bug there may not have been noticed.

Here's the last post from that thread... Basically it's an issue which v3 doesn't exhibit but v4 and v5 do. I think I may have hit this in production, just after I finally moved to v4.x after staying on a patched v3.x for many months.

It's intermittent but definitely does reproduce for me after a few executions. Hoping you can shed some light on the issue :) Thanks for the library and the general community work!

Post from #57 starts now...

The repro provided by @Allon-Guralnek will cause the issue for me.

In LinqPad I've got this

AsyncLock l = new AsyncLock();

void Main()
{
	Task.Run(() =>
	{

		Console.WriteLine("Starting tasks...");
		var tasks = Enumerable.Range(0, 100).Select(i => Task.Run(() => Do1(i))).ToArray();

		Console.WriteLine("Tasks started. Waiting for all to complete...");
		Task.WaitAll(tasks);

		Console.WriteLine("All done!");
	}).Wait();
}

void Do1(int i)
{
	if (i == 0)
		Thread.Sleep(100);

	using (l.Lock())
	{
		// mutate shared state atomically
	}

	Console.WriteLine("Done: " + i);
}

void Do2(int i)
{
	if (i == 0)
		Thread.Sleep(100);

	lock (l)
	{
		// mutate shared state atomically
	}

//	Console.WriteLine("Done: " + i);
}

You'll note that I've got a Task.Run() in the main() method to get it started.

Sometimes it runs through just fine. Other times it'll stop.

Example output

Starting tasks...
Done: 1
Tasks started. Waiting for all to complete...
Done: 2
Done: 99

and 1 minute 13 seconds later I'm still nowhere :( After 1 minute 28 seconds it suddenly finished

Starting tasks...
Done: 1
Tasks started. Waiting for all to complete...
Done: 2
Done: 99
Done: 4
Done: 3
Done: 6
Done: 7
Done: 8
Done: 5
Done: 9
Done: 98
Done: 0
Done: 10
Done: 11
Done: 12
Done: 13
Done: 14
{{snip}}All the rest were in order{{/snip}}

But I can hit stop & start and get it working again. 9 times out of 10 it works properly. Removing the Console.WriteLine inside the loop seems to make it fail more often.

With LinqPad I can quickly swap between Nito.AsyncEx.Coordination NuGet package - this made the poor output above Nito.AsyncEx NuGet package v3.x - works Nito.AysyncEx NuGet package v4.x (strong named one was convenient to use) - failed first go with no output at all :( Nito.AsyncEx v3.x with one line patched from earlier - works

So this appears to be a regression where v3.x - with the one-line patch from earlier in this issue - is good but AsyncEx v4.x and v5.x are both bad.

It's a tricky one as it's intermittent and does eventually come good.

Sep 09 '17 05:09 IanYates

Yes, this is actually a known problem (though it didn't have a GH issue before). In fact, this is exactly the reason why AsyncEx v5 is still in prerelease. One of my commercial clients first made me aware of it, and while they're waiting for a fix they just cranked up the min-threads on the thread pool as a workaround.

This problem comes up most often when:

Many thread pool threads are synchronously waiting on an AsyncEx primitive.
The thread pool is exhausted.

What's actually happening is that the first lock goes through just fine (synchronously). Then the thread pool is exhausted with a bunch of threads synchronously blocked on that lock. Then when the first thread unlocks, it releases the next waiter, but requires a thread pool thread to do so. Thus, it must wait until the thread pool gets around to running that "release" code. Meanwhile, the thread pool is injecting one new thread every 2 seconds or so, in no rush. So we end up with a lock convoy with huge wait times due to the thread pool injection rate.

This is due to the core change of v4 -> v5: all continuations are asynchronous in the core of all the primitives (AsyncWaitQueue is the "core"). This simplifies the code considerably, but does assume that there is available room on the thread pool.

For a solution, I'm trying out variations of a custom "completer" thread. Something very similar to Ayende's blog here.

Sep 10 '17 16:09 StephenCleary

Hi @StephenCleary - thanks very much for the comprehensive answer.

Ayende's solution looks pretty smart. Having a thread set aside to "keep things going" whilst also letting the ThreadPool race to handle the continuation might add a tiny bit of overhead (negligible probably) but avoids this, which is a huge plus.

Your mention of min threads does remind me that in our web app I've bumped up min threads as it helped with something similar a while ago. Maybe I've just run into it again? Anyway, v3.x (with the one change discussed about a year ago from #57 ) is working fine for me at the moment.

I only ran into this, I think, because in our web app I have some leases on some image & blob objects which are acquired asynchronously. There's also a background process which releases them from memory if they haven't been actively used in a while. That's handled using IDisposable, and to ensure a lease isn't returned whilst disposing (returning a lease after dispose just disposes rather than pooling again) I took a sync lock during the Dispose. I still have that code, but since the only thing really doing a Dispose is that background task, it does the resource cleanup asynchronously, effectively using an AsyncDispose call, before calling the real Dispose. Therefore the real Dispose just checks to see if disposed is already set, and if so, doesn't do any extra work. I'm safe enough in case I do a Dispose via other means in future but avoid the sync-lock on an AsyncLock.

Thanks again for the great library & support!
Feel free to close this issue, or if it helps with planning, leave it open.

Sep 10 '17 23:09 IanYates

Basically these starvation issues can only ever happen when using the AsyncEx primitives both synchronously and asynchronously.

Isn't it?

Oct 16 '17 08:10 molinch

No; the problem can be caused by either synchronous or asynchronous APIs when the thread pool is saturated. At the core, both sync and async lock APIs require a thread pool to become unlocked, and that's what causes the issue.

Oct 16 '17 12:10 StephenCleary

Thanks it's clear now.

This is due to the core change of v4 -> v5: all continuations are asynchronous in the core of all the primitives (AsyncWaitQueue is the "core"). This simplifies the code considerably, but does assume that there is available room on the thread pool.

For a solution, I'm trying out variations of a custom "completer" thread.

Don't you think that spawning your own thread to solve this issue would be both overkill and complex? Making continuations synchronous again, so we don't have this issue, seems way nicer and easier to debug.

Oct 18 '17 06:10 molinch

Making the continuations synchronous has its own problems. For a given TaskCompletionSource, its continuations can either be forced-asynchronous (used by v5) or unspecified (used by v4).

Synchronization primitives in v4 avoid this thread pool exhaustion issue by using synchronous continuations. However, since end-user code is also expressed as a continuation, we have to force those continuations to be asynchronous or else we'd end up with a callback-under-lock scenario, which very easily results in deadlock. So in v4, all continuations are actually semi-asynchronous; they're synchronous enough to avoid the thread pool exhaustion but asynchronous enough to avoid the deadlock. As you can imagine, this complicates the code considerably.

I do much prefer the approach forcing asynchrony for end-user continuations, just from a code perspective (simpler, easier to prove correct, easier to maintain). Assuming I keep this, there are two options for avoiding the thread pool exhaustion issue: introducing a separate thread, and separating end-user continuations from internal continuations (requiring separate TCSs). I've played with both and am definitely leaning towards the thread solution.

Oct 18 '17 13:10 StephenCleary

Anytime I've used AsyncEx I've typically bought into multithreading enough that paying the overhead of one more dedicated thread was not an issue for performance. I don't think it would make a difference, one way or the other, for the experience of debugging client code. The thread solution gets my vote if it's simpler to maintain.

Oct 18 '17 14:10 LarsKemmann

The beauty of asynchronous programming is that "there is no thread". I learned it from Stephen, both from his book and articles.

If it's easier to maintain I understand of course, but I just find it a bit "sad" we have to rely on spawning a manual Thread, which isn't controlled by the ThreadPool. It still feels complicated.

Oct 18 '17 14:10 molinch

Possibly complicated, but it's entirely abstracted away by the library to the point where 99% of library consumers won't even know it's there.

Making continuations asynchronous avoids the "sometimes sync, sometimes async" behaviour that brought jQuery promises a bad name. I appreciate the environment is different - since JS is completely single-threaded - but the consistency argument still applies (see http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony and https://medium.com/@bluepnume/intentionally-unleashing-zalgo-with-promises-ab3f63ead2fd for the JS-side of the fence).

Oct 19 '17 06:10 IanYates

Thanks Ian these are very good arguments, I agree most people will just consume the library, without looking deep in the internals.

One remark though: even today when you await in C#, it may go synchronous or asynchronous as well. If the Task is already completed then the continuation will most likely be synchronous. So what you describe is already happening now (I have to admit it's not the usual use case).

Oct 19 '17 07:10 molinch

Just to clarify, the synchronous vs asynchronous continuations that I'm talking about are only when there is a continuation, so from the end-user's perspective this only applies when the await is asynchronous. When I'm talking about the continuation being synchronous or asynchronous, that's from the perspective of the TCS-completing thread.

To use a lock example, LockAsync will behave synchronously if the lock is unlocked; that's not going to change. The situation we're fixing here is when the lock is already locked, and some code UseResource calls LockAsync (which behaves asynchronously since the lock is already locked) and awaits on the returned task. Then when some thread releases the lock, we need to complete that TCS, but we can't do it while holding an internal lock. We need to push that completion outside of the lock, which is most easily done by forcing the continuation to be asynchronous.

So there are still plenty of situations with AsyncEx where end-user code will await, and it will behave synchronously, just like other TAP code. This issue is purely about a careful internal design that will be maintainable, prevent deadlocks, and also continue to work when the thread pool is exhausted. It won't have any effect on end-user semantics as far as which operations act synchronously and which ones are asynchronous.

This actually comes down to how await schedules continuations. Specifically, it always schedules them synchronously - so, after an await acts asynchronously, when its task is completed, the asynchronous method that did the await attempts to resume synchronously on the same thread and stack that is completing the task. When I discovered this, I found it very surprising and reported it as a bug (closed as "by design"). It can cause deadlocks (as described in my bug report) as well as stack overflows, unless end-user code take steps to prevent it.

It's my personal opinion that JavaScript's promise callback and await semantics are much better than C#'s await semantics. I think Microsoft was too focused on micro-benchmarks and missed out on providing an easier-to-use abstraction for developers.

Oct 19 '17 13:10 StephenCleary

Reading the VS release notes for 15.4.1 (out as of about 40 minutes ago) I came across this Roslyn thread https://github.com/dotnet/roslyn/issues/22650#issuecomment-335638013

It's not quite the same issue but it reminded me of this discussion. A block synchronous resource - disk - coupled with lots of async calls piling up, then the thread pool starving (or growing very slowly). Their problem is they also gobbled up RAM at the same time. The solution they seem to be discussing (and have released) is to have a single thread to which the other tasks post their requests for work.

Sorry if this is off-topic - I just found it interesting as another manifestation of a problem in the same ballpark as this thread's issue.

Oct 20 '17 01:10 IanYates

That is very interesting. Particularly since the thread pool has special dedicated I/O threads so that the limited thread injection rate shouldn't affect that kind of scenario. But in reality, a lot of the time the BCL code running on the I/O thread will post work to a worker thread so it "jumps" over to the more limiting side before it executes end-user code.

Oct 22 '17 19:10 StephenCleary

Why not just create a SemaphoreSlim(1,1) and use WaitAsync() on it. That has the effect of locking?

Dec 09 '17 18:12 waynebrantley

@waynebrantley Yes, that's a fine workaround while this bug is still in AsyncEx.

Dec 10 '17 19:12 StephenCleary

Would a wrapped SemaphoreSlim(1,1) cause similar issues under high load?

Jan 08 '18 13:01 sjb-sjb

@sjb-sjb No; it does not depend on a thread pool thread being free in order to release.

Jan 08 '18 15:01 StephenCleary

I wonder what the rationale is for using the queuing approach rather than the older SemaphoreSlim method.

Jan 08 '18 17:01 sjb-sjb

Historically, SemaphoreSlim did not support asynchronous operations at the time AsyncLock was written. I'm not sure if SemaphoreSlim supports them on all the AsyncEx platforms today. But even if it did, AsyncLock has some advanced usages (namely, integration with AsyncConditionVariable and the ability to use custom queues to implement alternative locking strategies such as prioritized locks) that can't be supported with a SemaphoreSlim wrapper.

Under the hood, SemaphoreSlim also uses a queuing approach, but their implementation avoids the thread pool exhaustion issue here. They use an approach more similar to what I used in AsyncEx v4 and earlier.

Jan 08 '18 18:01 StephenCleary

SemaphoreSlim does not guarantee the order in which the threads are released.

Having this guarantee is an important feature of AsyncLock IMO.

Jan 08 '18 19:01 jam40jeff

@StephenCleary Which min-thread settings in the thread pool should be enough to alleviate the problem without causing others?

ThreadPool.SetMinThreads(Int32, Int32) Method

Jan 11 '18 13:01 jesuissur

@jesuissur Bumping the min-threads is one option, and some v5 users are using that as a workaround for now.

Jan 11 '18 14:01 StephenCleary

@StephenCleary Sorry, I think my question was not clear enough. Which min-threads value should be a good starting point to try the bumping workaround?

Jan 11 '18 15:01 jesuissur

@jesuissur That depends on your application type and server. I'd recommend getting the minimum threads first and then increasing it.

Jan 11 '18 15:01 StephenCleary

Seems it's ok in netcoreapp(but not ported to netfx), Task.Wait() don't need an new thread in netcoreapp(whether TaskCreationOptions.RunContinuationsAsynchronously is set). https://github.com/dotnet/coreclr/blob/v2.1-rc1/src/mscorlib/src/System/Threading/Tasks/Task.cs#L2955 https://github.com/dotnet/coreclr/commit/0163ed29ca05275f94101051284be9a8ccf235ed

So maybe similar logic can be apply to AsyncEx's (Synchronous)TaskExtensions.Wait*()?

    public static class TaskExtensions
    {
        ///......

        //Change all `task.GetAwaiter().GetResult()/Wait(...)` before to `task.WaitResultSync(...)/WaitSync(...)

        private static T WaitResultSync<T>(this Task<T> task, CancellationToken cancellationToken = default(CancellationToken))
        {
            task.WaitSync(cancellationToken);
            return task.GetAwaiter().GetResult();
        }

        private static void WaitResultSync(this Task task, CancellationToken cancellationToken = default(CancellationToken))
        {
            task.WaitSync(cancellationToken);
            task.GetAwaiter().GetResult();
        }

        private static void WaitSync(this Task task, CancellationToken cancellationToken = default(CancellationToken))
        {
            if (task.IsCompleted)
                return;
            var t2 = task.ContinueWith(t => { }, cancellationToken, TaskContinuationOptions.ExecuteSynchronously | TaskContinuationOptions.HideScheduler, ForceInlineTaskScheduler.Instance);
            t2.Wait(cancellationToken);
        }

        private class ForceInlineTaskScheduler : TaskScheduler
        {
            public static ForceInlineTaskScheduler Instance { get; } = new ForceInlineTaskScheduler();

            private ForceInlineTaskScheduler() { }

            protected override IEnumerable<Task> GetScheduledTasks() => null;

            protected override void QueueTask(Task task)
            {
                TryExecuteTask(task);
            }

            protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
            {
                return TryExecuteTask(task);
            }
        }
    }

May 15 '18 13:05 yyjdelete

The problem isn't that Task.Wait needs a thread pool thread.

The problem is that TaskCompletionSource.TrySetResult executes end-user callbacks (continuations) synchronously, and I call it while holding a lock. To prevent deadlocks, I use TaskCreationOptions.RunContinuationsAsynchronously with that TaskCompletionSource. This is what is requiring the thread pool thread, and causing extreme slowdowns (and possibly deadlocks) when the thread pool is exhausted.

Running the continuations synchronously would just trade one deadlock for another.

May 15 '18 13:05 StephenCleary

Yes, the before code didn't change that. All continue outside still run in async. WaitSync() just create an temp Task t2, and force it to be run after the orig task in sync, and Wait the new Task.

May 15 '18 14:05 yyjdelete

Ah, I see now. That is an interesting approach! I'll take this into consideration.

May 15 '18 14:05 StephenCleary

Can the thread pool exhaustion be detected upfront? So if you start off something that needs one additional thread, could you know that it's going to be a problem and do something about it while a thread is still awake?

Jun 28 '18 13:06 ygoe

@ygoe: I've never tried. There's ThreadPool.GetAvailableThreads, but it has unclear semantics - is the value the number of currently available threads, or the number of threads that can be created without a cooldown delay?

Jun 29 '18 02:06 StephenCleary

I'm sorry I don't know it myself. Just tried to throw in an idea to get this resolved. My own library is now tied to being a pre-release because of this dependency.

Jun 30 '18 12:06 ygoe

I've been working on a few different experiments related to this issue. During my travelling for OSCON, I thought it would be useful to write out the background and context in which I'm thinking about this problem. Doing this was a great exercise; it allowed me to step back and identify each of the issues at hand (and then choose a new approach). This comment is mostly a brain-dump for now, but some of it will be reworked into a CONTRIBUTING doc at some point.

Central Postulates:

AsyncEx uses locks, not lockless code. Lockless code is difficult to understand and verify, and - even more dangerously - it's very common to think one understands how to write lockless code when this is not the case, due to the incredible complexity of memory models. See Double-Checked Locking Is Broken for a good historical example from Java.
The greatest cardinal sin of concurrent programming is invoking arbitrary code while holding a lock. In the general case, this includes completing a TCS, since continuations may run synchronously.

AsyncEx Code

Most AsyncEx coordination primitive code is straightforward and doesn't even use async/await. However:

Some methods do use async/await.
One interaction is particularly notable: the relationship between AsyncLock and AsyncConditionVariable. The ACV needs to be able to unlock and relock the AL without affecting the IDisposable "key" that the user already has.
Coordination primitives which queue are built on a wait queue abstraction, which permits users to do fancy things like create priority locks. This wait queue abstraction does force us to use sync-over-async in some places instead of using the (preferred) boolean argument hack; AsyncConditionVariable is a good example of this.

So, a good cross-section for experimentation purposes is:

AsyncManualResetEvent (extremely simple, no queue)
AsyncSemaphore (simple with queue)
AsyncLock + AsyncConditionVariable (internal interaction, and code that requires sync-over-async).

When I run experiments like this, I focus on these four types. If it works for them, it should work anywhere.

AsyncEx v4

In v4, all TCS's used synchronous continuations, just like "normal" async code. v4 avoided invoking end-user callbacks under lock by separating the TCS completion code from internal logic. This necessitated a bit of a "dance" between the wait queue and the coordination primitives, with the completion code represented by a disposable.

This worked well enough, but has always had one theoretical issue: stack overflow is a possibility. The TPL does have heuristics in place to prevent this, but they don't always work. AFAIK, this was never actually observed in production.

In v5, one of the main changes was to use asynchronous continuations for TCS's. This simplified the wait queue code (and coordination primitive code) by removing the "dance" completely. Unfortunately, this has caused a dependency on the thread pool even for internal operations. And this issue is not theoretical; it is very rare but has certainly been observed.

Possible Solutions

Solutions must maintain the Central Postulates above, in addition to these considerations:

If we have a dedicated thread, then we want to keep end-user code from ever running on that thread. The thread must be dedicated to executing our own short, nonblocking internal code.
Our internal code must not depend on a thread pool thread. In the original example code for this problem, Lock returned an IDisposable whose Dispose implementation dependended on a thread pool thread. However, it is permissible to depend on a thread pool thread to resume executing asynchronous end-user code.

The solutions that I've considered (some of these have experimental code in different branches):

ForceInline:
- Use RunContinuationsAsynchronously | DenyChildAttach flag for all TCS tasks. These can be returned to the user or our own code.
- Our code uses ForceInline to override the RunContinuationsAsynchronously flag and force our continuations to be synchronous.
- Evaluation: :+1: Similar to how the old code works. :-1: No protection from stack exhaustion. The old code did (in theory). :-1: Users can use ForceInline to override our own force-asynchronous behavior.
Dedicated Thread:
- Use RunContinuationsAsynchronously | DenyChildAttach flag for all TCS tasks. These can be returned to the user or our own code.
- Our code uses ResumeOnDedicatedThread to override the RunContinuationsAsynchronously flag and force the continuations to run asynchronously on our dedicated thread.
- Our code does not need to complete TCS's from the dedicated thread. :+1: Our code always runs on the dedicated thread. :-1: Users can use ForceInline to override our own force-asynchronous behavior, and thus execute end-user code on our dedicated thread.
TCS Pair with ConfigureAwait(false)
- FirstTcs is a normal TCS.
- SecondTcs is a normal TCS, linked from FirstTcs by giving FirstTcs a continuation that propagates its completion to SecondTcs, wrapped with an explicit Task.Run.
- Our code awaits FirstTcs using ConfigureAwait(false).
- User only sees SecondTcs, which is forced asynchronous (due to Task.Run). :+1: Similar to how the old code works, including (theoretical) protection from stack exhaustion. :+1: Users cannot force synchronous behavior using ForceInline, due to the explicit Task.Run. :-1: Our awaits are not guaranteed to be synchronous. If stack exhaustion is imminent, or if the user is (indirectly) completing a TCS that doesn't want to run our continuation, then we will still require a thread pool thread. The old v4 code had this same problem.
TCS Pair with ForceInline
- Just like above, but instead of awaiting FirstTcs.ConfigureAwait(false), our code awaits FirstTcs.ForceInline(). :+1: Similar to how the old code works. :+1: Users cannot force synchronous behavior using ForceInline, due to the explicit Task.Run. :-1: No protection from stack exhaustion. The old code did (in theory).
TCS Pair with ResumeOnDedicatedThread
- Just like above, but instead of awaiting FirstTcs.ConfigureAwait(false), our code awaits FirstTcs.ResumeOnDedicatedThread(). :+1: Our code always runs on the dedicated thread. :+1: Users cannot use ForceInline to override our own force-asynchronous behavior, due to the explicit Task.Run. This prevents users from executing end-user code on our dedicated thread.

Conclusion

After writing all these out, I believe the best solution going forward is the last one. My next step is to experiment with that solution on the experimental cross-section (AsyncManualResetEvent + AsyncSemaphore + AsyncLock + AsyncConditionVariable), and if that looks good, apply it through the rest of the code.

Jul 21 '18 14:07 StephenCleary

I'm very sorry to sound demanding, but this project appears dead to me. This library does a really great job and is really helpful in my project. I've read a lot of helpful and profound comments and answers from you in several places, that's why I consider your work valuable. I decided to use the v5 branch of AsyncEx to be more future-proof and I think there was also another reason to use it. But this ongoing delay of a release is starting to get in my way now. I'm about to release a library myself and it makes use of some of the AsyncEx functions. NuGet forces me to label my package as a pre-release because it has a pre-release dependency. So I'm considering to remove that package dependency and incorporate all referenced code into my own code base. This isn't something I like to do, but it seems the only way out of this release handling. A regular fork of the library would be more difficult because I'd need a new name and do all the packaging. And even though forks might immediately help others, too, they usually hurt the community in the long run.

So if you could find some time to resolve this last issue or maybe at least note it as a known issue in a stable release, I'd really appreciate that. There have been some suggestions in this discussion that sound promising. I know I'm not very helpful in the matter, I've tried before but this just exceeds my experience.

Feb 10 '19 21:02 ygoe

@ygoe Yes, I have considered releasing a stable version with this as a known bug. The perfectionist in me doesn't like that, but commercial software does it all the time.

I've done a ton of work in different branches, approaching the problem different ways, but hitting blocks each time. I did finally find a permanent solution, but haven't had time to implement it (the final solution also requires careful documentation).

At the same time, I'm aware of the problems with keeping a library prerelease for so long. My current plan is to try to finish this issue once and for all over MVP summit (one time of the year when I have a lot more free time than normal). If I don't finish it then, then I promise I'll release it as stable with a known issue.

Feb 10 '19 21:02 StephenCleary

@StephenCleary Thank you for the update. I'm looking forward to end of March this year. :-)

Feb 10 '19 22:02 ygoe

Concerning the use of SemaphoreSlim, I observe the same issue here, but only when async and sync stuff are mixed.

Here is the code to reproduce this issue using SemaphoreSlim:

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

namespace Example
{
  public class Program
  {
    static readonly SemaphoreSlim s = new SemaphoreSlim(1, 1);

    static void Main()
    {
      Task.Run(() =>
      {
        Console.WriteLine("Starting tasks...");
        var sw = new Stopwatch();
        sw.Start();

        var tasks = RunTasks(onlySync: true, onlyAsync: false);

        Console.WriteLine("Sync tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All sync tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");

        sw.Reset();
        sw.Start();

        tasks = RunTasks(onlySync: false, onlyAsync: true);

        Console.WriteLine("Async tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All async tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");

        sw.Reset();
        sw.Start();

        tasks = RunTasks(onlySync: false, onlyAsync: false);

        Console.WriteLine("Mixed tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All mixed tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");
      }).Wait();

      Console.ReadLine();
    }

    static Task[] RunTasks(bool onlySync, bool onlyAsync)
    {
      bool mixed = !onlySync && !onlyAsync;
      var tasks = new Task[30];

      for (int i = 0; i < 30; i++)
      {
        Task task;
        int i1;

        if (mixed)
        {
          if (i % 2 == 0)
          {
            i1 = i;
            task = Task.Run(() => DoWithSemaphore(i1));
          }
          else
          {
            i1 = i;
            task = Task.Run(async () => await DoWithSemaphoreAsync(i1));
          }
        }
        else if (onlySync)
        {
          i1 = i;
          task = Task.Run(() => DoWithSemaphore(i1));
        }
        else
        {
          i1 = i;
          task = Task.Run(async () => await DoWithSemaphoreAsync(i1));
        }

        tasks[i] = task;
      }

      return tasks;
    }

    static void DoWithSemaphore(int i)
    {
      s.Wait();

      try
      {
        // mutate shared state atomically
        Thread.Sleep(100);
      }
      finally
      {
        s.Release();
      }

      Console.WriteLine("Done: " + i);
    }

    static async Task DoWithSemaphoreAsync(int i)
    {
      await s.WaitAsync();

      try
      {
        // mutate shared state atomically
        await Task.Delay(100);
      }
      finally
      {
        s.Release();
      }

      Console.WriteLine("Done (async): " + i);
    }
  }
}

When running the code and we only use "sync" tasks, the program completes after about 3 seconds.
When running the code and we only use "async" tasks, the program completes after about 3 seconds.
When running the code and we use mixed tasks, the program takes much longer to terminate (it varies a bit).

When you take a look at the source code of SemaphoreSlim, you can see that they are using a TaskNode class that is just a derived Task<> constructed with parameter TaskCreationOptions.RunContinuationsAsynchronously.

As soon as async waiters are in the game (see line 358 of SemaphoreSlim), all follow-up sync waiters are also put in the queue of async waiters.

On the other side, if we only have sync waiters (and no async ones), a pulse-wait approach is used, which does not need a thread pool.

And if we only have async waiters, the problem does not come into play, since we do not lock a task; instead we await it, and when awaited, the underlying thread goes back to the tread pool.

So, I don't think that using SemaphoreSlim solves the issue.

@StephenCleary You mentioned that you have a final solution in mind. I'm looking forward for it and a little bit curious what kind of solution it is.

Aug 27 '19 17:08 peter-perot

@StephenCleary How about this approach? Is it safe?

Jan 12 '20 12:01 Havret

Hey @StephenCleary, I just ran into this problem in production. Until a v5 fix is in place, do you recommend downgrading to v4 or increasing the thread pool minimum thread count? Also, any idea as to when the issue might be resolved in v5?

Apr 21 '20 14:04 tmendez3

@tmendez3 Whether or not increasing your thread pool minimum will help depends on your system.

I use AsyncLock mainly as part of a cache-aside implementation to keep multiple consumers from going to remote caches or system-of-record for the same cache key and I call through that code 500M times a day.

It doesn't matter what I set minimum threads to, when using v5 it will sporadically (between a few times a week and a few times a day) use them all up and as many more as it can create before the process gets recycled for being unresponsive. If that start happening on more than 1 of my VMs at a time it's a quick and slippery slope to cascading overload failure.

On v3, everything is peachy, been running this system for years with 99.995% reliability. I haven't really given v4 a shot, so I cannot comment on that. I also target .Net Framework v4.7.2 - it sounds like maybe Core does not exhibit this issue - of course I'm going through updating dependencies as part of an effort to target Core (no small task on an active 15 year old code base serving nearly a billion requests a month).

I do have two key prefixes in my system that can end up calling AsyncLock.Lock() and AsyncLock.LockAsync() from different places on the same AsyncLock object (every unique cache key gets its own AsyncLock) and I haven't yet tested if removing that overlap clears up our issues with v5 or if they persist.

May 04 '20 19:05 qstarin

Thanks for that use case @qstarin.

I did de-prioritize this for quite a while due to concerns that fixing this wouldn't actually fix the underlying issue. Specifically, if your code is exhausting the thread pool, sure that would cause a deadlock (technically slow lock) with AsyncEx, but that same underlying problem of thread exhaustion would just cause other asynchronous code to have the same issue (severely delayed, deadlock-like behavior); this is because almost all I/O APIs require a free thread pool thread for completion.

However, I do want to fix this issue.

There are a couple of paths forward. One is the "fully correct" solution outlined above; this would require several consecutive days of work and considerably complicate the implementation. Unfortunately, my opportunity for "several consecutive days of work" on a hobby project isn't great. Historically, I've been able to do the "hard" work on AsyncEx at the Microsoft MVP conference. But 2020's MVP conference was virtual so I lost that opportunity, and next year's conference is also virtual.

The other solution is to essentially take the AsyncEx 4 solution. This would only require a weekend or so, not an entire week. Technically, this solution is incorrect because it assumes ExecuteSynchronously always executes synchronously. This is true the vast majority of the time but could be false if the stack is too deep at the point the lock is released. However, I think this is an acceptable drawback; in all the years of AsyncEx, no one has actually reported that as a real production issue. On the other hand, I have suspicions that the stack overflow detection heuristic (which never worked that well on .NET Framework) is more sensitive on .NET Core - and if that is true - this situation would happen more commonly on .NET Core. But I'm guessing that it wouldn't trigger this scenario often enough to cause this issue (which always happens with the current v5).

Well, there are no obviously correct answers. :) But I think that in the interest of resolving this issue for AsyncEx v6 and above (the fix would require a major version rev), I'm going to take a weekend (sometime...) and restore the v4-style completion behavior.

May 17 '20 23:05 StephenCleary

So any news on this?

Apr 25 '22 14:04 milspiir

@milspiir Not that I've seen, and I'm still on v3.

I did remove any and all mixed sync/async locking on any AsyncLock objects and that made no difference in the deadlocking behavior observed in v5. I also tested v4 and found similar sporadic deadlocking.

A simple SemaphoreSlim lock in place of AsyncLock does not appear to deadlock, but performs quite badly. v3 appears to perform noticeably better in both uncontested and contested (especially contested) use than v4 & v5 - all of which perform vastly better than a simple SemaphoreSlim.

Technically, this solution is incorrect because it assumes ExecuteSynchronously always executes synchronously. This is true the vast majority of the time but could be false if the stack is too deep at the point the lock is released. However, I think this is an acceptable drawback; in all the years of AsyncEx, no one has actually reported that as a real production issue.

fwiw, @StephenCleary, I've had StackOverflowExceptions trigger with the default v3 behavior of resuming TaskCompletionSource's synchronously - it happens when I have close to 300 or more tasks waiting on the same lock object. In those cases, providing AsyncLock with an implementation of IAsyncWaitQueue<T> that creates the TaskCompletionSource<T> with the TaskCreationOptions.RunContinuationsAsynchronously option, but is otherwise the same as the wait queue in v3 of the library, avoids the deadlock - but also performs more poorly than the included implementation that continues synchronously.

Apr 25 '22 15:04 qstarin

I'm still planning to eventually fix this, and I still haven't had time. As noted previously, my primary hacking time is during conferences, which haven't happened since covid hit.

May 18 '22 12:05 StephenCleary

AsyncEx AsyncEx copied to clipboard

Another deadlock - or at least a very slow lock

Central Postulates:

AsyncEx Code

AsyncEx v4

Possible Solutions

Conclusion

AsyncEx
AsyncEx copied to clipboard