StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

[StackExchange.Redis 2.6.104] Timeout performing GET (5000ms) with no abnormal busy threads

Open KendoSai opened this issue 1 year ago • 6 comments

Hi, we are using Redis Sentinel + StackExchange.Redis 2.6.104 with this configuration,

var readLazyConnection = new Lazy<ConnectionMultiplexer>(() =>
            {
                ConfigurationOptions sentinelConfig = new ConfigurationOptions();
                sentinelConfig.TieBreaker = "";
                sentinelConfig.CommandMap = CommandMap.Sentinel;
                sentinelConfig.AbortOnConnectFail = false;
                sentinelConfig.SyncTimeout = 3000;
                sentinelConfig.Ssl = false;
                sentinelConfig.ConnectRetry = 3;
                sentinelConfig.ReconnectRetryPolicy = new ExponentialRetry(5000);
                sentinelConfig.KeepAlive = 180;
                sentinelConfig.ServiceName = instanceName;
                sentinelConfig.SocketManager = SocketManager.ThreadPool;
                sentinelConfig.IncludeDetailInExceptions = true;
                sentinelConfig.IncludePerformanceCountersInExceptions = true;

                foreach (var item in connectionString.Split(';'))
                {
                    sentinelConfig.EndPoints.Add(item);
                }

                ConnectionMultiplexer.SetFeatureFlag("preventthreadtheft", true);
                ConnectionMultiplexer sentinelConnection = ConnectionMultiplexer.SentinelConnect(sentinelConfig);

                ConfigurationOptions masterConfig = new ConfigurationOptions();
                masterConfig.ServiceName = "mymaster";

                ConnectionMultiplexer masterConnection = sentinelConnection.GetSentinelMasterConnection(masterConfig);

                return masterConnection;
            });

Few days ago, we started receiving Timeout exceptions

Timeout performing GET (5000ms), next: GET PageInfo_ProjectNaviagtionId_0, inst: 4, qu: 0, qs: 9, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 471, cur-in: 0, sync-ops: 51041, async-ops: 16015, serverEndpoint: 192.168.100.101:6379, conn-sec: 579.71, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: e745dc31d241(SE.Redis-v2.6.104.40210), IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=21,Free=32746,Min=1000,Max=32767), POOL: (Threads=28,QueuedItems=0,CompletedItems=116031), v: 2.6.104.40210 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I'm confusing because number of busy threads are still under Min/Max threads. Looks like the sync-ops are quite high because we are favoring synchronous calls to get/set redis cache (StringGet, StringSet) instead of async call. So I'm not really sure what the problem is and what could I do?

KendoSai avatar Apr 20 '23 17:04 KendoSai

What is the size of the payload you are reading from Redis in this case? Large payloads are pipe blockers which may be what you're seeing here, but also sync operations, combined both are bad but you shouldn't be stalled on immediate thread pool growth given you min config I agree.

NickCraver avatar Aug 02 '23 11:08 NickCraver

Hi @NickCraver, sorry for my late reply

We have managed to reduce the number of timeout exceptions from ~2000-3000 exceptions a day to ~100-200 exceptions a day by converting all synchronous method calls to async ones. I've tried to implement the ConnectionMultiplexer pooling but it doesn't work yet.

I still want to find the root cause and get rid of timeout exceptions but looks like I'm having no clues :( Here is the latest timeout exception

Timeout awaiting response (outbound=901KiB, inbound=0KiB, 5508ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX listing_lists_categories_nha-dat-ban__62_357, inst: 0, qu: 0, qs: 4, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 2, cur-in: 0, sync-ops: 1, async-ops: 323034, serverEndpoint: 192.168.100.101:6379, conn-sec: 4670.11, aoc: 0, mc: 1/1/0, clientName: sitemap-68f4ff78b4-sw6mn(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=2,Free=32765,Min=1,Max=32767), POOL: (Threads=2,QueuedItems=4,CompletedItems=1106635,Timers=18), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Strangely I couldn't find the mgr attribute in the exception message, do you know why?

I have tried to subscribe to StackExchange.Redis events but there is no logged events as well

ConnectionMultiplexer connection= ConnectionMultiplexer.Connect(config);  
connection.ConnectionFailed += ConnectionFailed;
connection.ConnectionRestored += ConnectionRestored;
connection.InternalError += InternalError;
connection.ErrorMessage += ErrorMessage;

private void ConnectionFailed(object? sender, ConnectionFailedEventArgs e)
{
    _logger.LogError(e.Exception, "Redis connection error {FailureType}", e.FailureType.ToString());
}

private void ConnectionRestored(object? sender, ConnectionFailedEventArgs e)
{
    _logger.LogInformation("Redis connection error restored");
}

private void InternalError(object? sender, InternalErrorEventArgs e)
{
    _logger.LogError(e.Exception, "Redis internal error {Origin}", e.Origin);
}

private void ErrorMessage(object? sender, RedisErrorEventArgs e)
{
    _logger.LogError("Redis error: {Message}", e.Message);
}

KendoSai avatar Sep 06 '23 10:09 KendoSai

Lack of mgr indicates that the socket manager does not exist - is it possible this is in the midst of a dispose of the multiplexer?

NickCraver avatar Sep 06 '23 13:09 NickCraver

Let me check on that. Btw do you know if I should set the SocketManager to SocketManager.ThreadPool

The current config looks like this config.SocketManager = SocketManager.ThreadPool;

KendoSai avatar Sep 06 '23 16:09 KendoSai

If you do that, you're at the mercy of the thread pool to service the socket - the default is the dedicated socket manager with it's own thread pool - I don't see threads exhausted by what you're doing but you can try both ways (e.g. removing the setting override you currently have)

NickCraver avatar Sep 06 '23 16:09 NickCraver

Thanks @NickCraver for sharing the info. Let me try then.

KendoSai avatar Sep 06 '23 16:09 KendoSai