StackExchange.Redis
StackExchange.Redis copied to clipboard
Frequent timeout errors with session state provider in Azure
(I'm not sure whether here or https://github.com/Azure/aspnet-redis-providers is the right place to post these inquiries. I posted an issue there yesterday but haven't received a response so far, so I'm cross-posting here).
I am using the Redis session state provider with an ASP.NET Azure App Service. I'm using Azure Cache for Redis for the cache.
Last week, I had a look at my app's logs, and noticed that I was getting several dozen Redis timeout errors per day. I have had issues in the past where Redis timeouts brought my app to a complete standstill, but that doesn't seem to be happening now. The timeouts occur sporadically but don't cause a critical disruption.
Service levels:
- App Service: Standard S1
- Redis: P1. I was using C2 until Friday, but the upgrade doesn't seem to have made a difference.
Today, I tried increasing the operation timeout value from 1000ms to 2000ms on Monday and then to 3000ms today, but I am continuing to see about a dozen timeouts per hour.
Below is the error message and stack trace from one of these timeouts. What can I do to figure why these are happening and how to remedy them?
StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL (3000ms), next: EVAL, inst: 1, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 10498, async-ops: 1, serverEndpoint: ************:6380, conn-sec: 2360.93, mc: 1/1/0, mgr: 10 of 10 available, clientName: ***********(SE.Redis-v2.6.90.64945), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=8187,Min=1,Max=8191), v: 2.6.90.64945 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Stack Trace: at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1980
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in /_/src/StackExchange.Redis/RedisBase.cs:line 62
at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1520
at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.<Eval>b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68
at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95
at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 118
at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68
at Microsoft.Web.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(DateTime lockTime, Int32 lockTimeout, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 184
at Microsoft.Web.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContextBase context, String id, CancellationToken cancellationToken, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 291
at Microsoft.Web.Redis.RedisSessionStateProvider.<GetItemExclusiveAsync>d__20.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 206
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNet.SessionState.SessionStateModuleAsync.<GetSessionStateItemAsync>d__74.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNet.SessionState.SessionStateModuleAsync.<AcquireStateAsync>d__65.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar)
at Microsoft.AspNet.SessionState.SessionStateModuleAsync.EndAcquireState(IAsyncResult result)
at System.Web.HttpApplication.AsyncEventExecutionStep.InvokeEndHandler(IAsyncResult ar)
at System.Web.HttpApplication.AsyncEventExecutionStep.OnAsyncEventCompletion(IAsyncResult ar)
What is the server endpoint here, and where is it in relation to the server? I'm trying to get a feel for payload sizes and latency involved here.
For what it's worth, the default max timeout against Azure instances is 5000ms, but that's based on people often connecting cross-region and such unaware of the consequences, so let's see what's happening in your specific scenario.
@NickCraver Thank you very much for your reply.
I'm not sure what you mean by "what is the server endpoint", but both the website and the Redis cache are in the same Azure region (West US). To reiterate, the website is an Azure App Service and the Redis cache is an Azure Cache for Redis instance, currently P1 service level.
This looks like asp.net (meaning: not "current .net") code, and it is a known reality that "old" asp.net is particularly glitchy with "sync over async" scenarios (in the context of pool thread exhaustion). This looks to be synchronous code in the provider. My recommendations would be:
- log an issue with the provider, https://github.com/Azure/aspnet-redis-providers, for true async
- consider migration to "current" .net
I understand that "2" is complex in some scenarios, but in terms of impact: there's a huge difference between "current" .net and "old" asp.net; it may be worth consideration.