CacheManager
CacheManager copied to clipboard
RetryHelper causing deadlock
Hi,
We had issues with a web site of ours that freezes from time to time. Looking at the process dumps, there's thousands of threads (initiated by web requests) waiting for a lock in the Eval method in RedisCacheHandle.cs, which is on this line as far as I can see: https://github.com/MichaCo/CacheManager/blob/4385c06a409e9736a7531c5e11003032b8a4c151/src/CacheManager.StackExchange.Redis/RedisCacheHandle.cs#L977
It seems as the reason for that is that one single thread is stuck in RetryHelper.cs, waiting forever for a Task.Delay on this line, that I guess was called because there was some temporary Redis server issue: https://github.com/MichaCo/CacheManager/blob/3f3ad356dce3f7a6a15e7caa0df68ea087beb59b/src/CacheManager.StackExchange.Redis/RetryHelper.cs#L43
Doing .Wait() on a awaitable could AFAIK cause deadlocks when in ASP.NET request context because the synchronization context cannot be restored when the continuation is run: https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
I guess a workaround would be to add ConfigureAwait to avoid deadlock even though it's sort of a hack:
Task.Delay(timeOut).ConfigureAwait(false).GetAwaiter().GetResult();
Or just use Thread.Sleep(timeOut)
in this case, or am I missing something?
or am I missing something?
No I think you are right, that .Wait
could cause issues here.
I haven't been running into any issues under ASP.NET Core though, do you use classic ASP.NET?
To work around for now, could you try to set the retry trimeout to 0. This should always cause the Task.Delay to run synchronous.
Or, set the retry count to zero, but that could cause a lot of unexpected exceptions because the client tends to throw timeouts randomly in some situations.
Yes, we're currently on classic ASP.NET (in hopefully a not to distant future on ASP.NET Core though) but I guess it could happen in ASP.NET Core as well, but maybe it's not as likely somehow?
I thought about those workarounds you suggested also, for avoiding the Task.Delay Wait to deadlock. I guess that having a few more retries than we have currently (to account for retries being more frequent) and lowering the timeout to 0 could at least do something for the better.
Is it a tedious process to release a 1.2.1 hotfix or similar with just a fix for this? Can I help out with something in that case?