StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

`HMGET` frequent errors like StackExchange.Redis.RedisTimeoutException

Open hyperion-cs opened this issue 3 years ago • 14 comments
trafficstars

Redis version: 7.0.4 StackExchange.Redis version: 2.6.48

Hello! Thank you for a wonderful project. Unfortunately, I sometimes get timeouts of the following kind:

StackExchange.Redis.RedisTimeoutException:
Timeout awaiting response (outbound=6KiB, inbound=0KiB, 5000ms elapsed, timeout is 5000ms),
command=HMGET, next: HMGET xxxc9d0d0d7-1715-e09b-8579-e7db6fe1e070, inst: 0, qu: 0, qs: 117, aw: True, bw: WritingMessage,
rs: ReadAsync, ws: Writing, in: 6307, in-pipe: 0, out-pipe: 0, serverEndpoint: xxx:6379, mc: 1/1/0, mgr: 10 of 10 available,
clientName: xxx(SE.Redis-v2.6.48.48654), IOCP: (Busy=0,Free=1000,Min=8,Max=1000),
WORKER: (Busy=50,Free=32717,Min=8,Max=32767),
POOL: (Threads=50,QueuedItems=260,CompletedItems=67095830), v: 2.6.48.48654

Utilization of server resources at such times is not much (both the client and the server are different nodes, but with a good channel of traffic and extremely low latency <1ms). The client has 8 vCPU, 64 GB RAM. SLOWLOG GET command returns (empty array), so it doesn't seem to be about redis itself.

I also found several related issues: https://github.com/StackExchange/StackExchange.Redis/issues/2186, https://github.com/StackExchange/StackExchange.Redis/issues/2185, https://github.com/StackExchange/StackExchange.Redis/issues/2069, https://github.com/StackExchange/StackExchange.Redis/issues/1859, https://github.com/StackExchange/StackExchange.Redis/issues/1570, https://github.com/StackExchange/StackExchange.Redis/issues/971. However, they did not help me solve the problem.

Perhaps you can tell me which way to go to find a solution?

hyperion-cs avatar Jul 27 '22 10:07 hyperion-cs

My personal experience with StackExchange.Redis.RedisTimeoutException errors over the past two days with Redis 6.x and StackExchange.Redis v2.2.4 and then v2.6.58 was that in the end it was related to thread exhaustion in our application on the client side and not related to anything wrong with Redis.

I read the https://stackexchange.github.io/StackExchange.Redis/Timeouts docs, read many similar issues reported in this repo and ran a number of single thread and multi-threaded tests directly against Redis with redis-cli, ruby and python until I was happy there was nothing wrong with the Redis instance.

Our integration with StackExchange.Redis is via the Microsoft.Extensions.Caching.StackExchangeRedis nuget package and the AddStackExchangeRedisCache DI wire up method rather than directly creating a ConnectionMultiplexer as described in the StackExchange.Redis docs and our application code at present is unfortunately all non-async. We started out by using the default internal ConnectionMultiplexer and we started seeing StackExchange.Redis.RedisTimeoutException issues when our application came under heavy request load.

Despite the docs making recommendations to change thread pool min threads for StackExchange.Redis v1.x only, I decided to try out a similar ThreadPool.SetMinThreads change as described there with v2.6.58 coupled with using SocketManager.ThreadPool to force use of the system thread pool instead of the internal SocketManager.Default internal thread pool. Our StackExchange.Redis.RedisTimeoutException problems disappeared after this change was applied.

I'm not suggesting that this type of change will work for everybody and I do know that our non-async code is not fit for purpose but this comment might be useful for anyone else with similar rubbish application code to ours that exhibits RedisTimeoutException problems when under heavy load :)

bebo-dot-dev avatar Jul 27 '22 19:07 bebo-dot-dev

@bebo-dot-dev, thank you for your detailed response and your experience. We also do not use StackExchange.Redis directly. It is used for distributed sessions via Microsoft.Extensions.Caching.StackExchangeRedis (as in your case, the AddStackExchangeRedisCache extension method). Can you tell me how you changed SocketManager.Default to SocketManager.ThreadPool in this case? Would something like this be suitable?

var redisCacheOptions = ConfigurationOptions.Parse(...);
redisCacheOptions.SocketManager = SocketManager.ThreadPool;

services.AddStackExchangeRedisCache(option =>
{
    ...
    option.ConfigurationOptions = redisCacheOptions;
    ...
});

We currently increased ThreadPoolMinThreads to 256 in .csproj:

<ThreadPoolMinThreads>256</ThreadPoolMinThreads>

However, the problem did not go anywhere:

Timeout awaiting response (outbound=6KiB, inbound=0KiB, 5720ms elapsed, timeout is 5000ms),
command=HMGET, next: HMGET xxxeba3a381-b033-dfda-a5a8-04f709813a07, inst: 0, qu: 0, qs: 427, aw: False, bw: SpinningDown,
rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: xxx:6379, mc: 1/1/0, mgr: 9 of 10 available,
clientName: xxx(SE.Redis-v2.6.48.48654), IOCP: (Busy=0,Free=1000,Min=8,Max=1000),
WORKER: (Busy=42,Free=32725,Min=256,Max=32767),
POOL: (Threads=81,QueuedItems=2,CompletedItems=224488645), v: 2.6.48.48654 

hyperion-cs avatar Jul 28 '22 10:07 hyperion-cs

Your code looks ok to me @hyperion-cs and it can be seen in your last stack trace that WORKER min threads is 256 so it looks like your config change has taken effect.

Something doesn't add up here for me though, the stack trace mgr: 9 of 10 available appears to indicate that the StackExchange.Redis internal thread pool is still in use instead of the regular .NET thread pool.

Not sure what the cause of that is tbh, I guess it could be related to how your code configures options inside AddStackExchangeRedisCache mixing a .Parse method call and then setting the SockManager prop directly afterwards. Maybe take a look at changing that removing the .Parse call and explicitly set all values via options.

bebo-dot-dev avatar Jul 28 '22 17:07 bebo-dot-dev

@bebo-dot-dev, I replaced the internal thread pool with a regular .NET thread pool after I left the comment above. So far, the problem with timeouts do not observe, but we need to wait a little to be sure. All in all, your advice seems to be working, thank you!

hyperion-cs avatar Jul 28 '22 17:07 hyperion-cs

Is your HashSet value too large? You can try to increase your Redis timeout to see if it helps

WeihanLi avatar Jul 29 '22 07:07 WeihanLi

@WeihanLi, no, the HashSet values are quite small: ≈50-60 bytes is the maximum. So, the recommendations from @bebo-dot-dev seem to have really helped and this issue can be closed. Thank you all very much!

hyperion-cs avatar Aug 01 '22 11:08 hyperion-cs

@bebo-dot-dev, @WeihanLi, Colleagues, unfortunately, I got an error again today:

Timeout awaiting response (outbound=3KiB, inbound=0KiB, 5268ms elapsed, timeout is 5000ms),
command=EXPIRE, next: HMGET xxx29086ec2-3a36-8113-4b55-5b230ba1968f, inst: 0, qu: 0, qs: 38, aw: False, bw: SpinningDown
rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: xxx:6379, mc: 1/1/0,
clientName: xxx(SE.Redis-v2.6.48.48654), IOCP: (Busy=0,Free=1000,Min=8,Max=1000),
WORKER: (Busy=22,Free=32745,Min=256,Max=32767),
POOL: (Threads=37,QueuedItems=1,CompletedItems=494244690), v: 2.6.48.48654

Three things can be noticed:

  1. There is no mgr value in the error - this indicates that the StackExchange.Redis internal thread pool is disabled and the .NET thread pool is used;
  2. WORKER(min) value is 256;
  3. POOL(Threads) is less than 256 and there is a queue. Why is this happening?

hyperion-cs avatar Aug 04 '22 16:08 hyperion-cs

@hyperion-cs what is the output from your redis-server when you run:

redis-cli --bigkeys

and

redis-cli --memkeys

slorello89 avatar Aug 04 '22 16:08 slorello89

@slorello89,

--bigkeys:

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest hash   found so far '"xxx457f183a-9667-9b6d-cfb5-4c6a934f9e71"' with 3 fields
[96.50%] Biggest list   found so far '"yyy"' with 7 items

-------- summary -------

Sampled 486 keys in the keyspace!
Total key length in bytes is 27663 (avg len 56.92)

Biggest   list found '"yyy"' has 7 items
Biggest   hash found '"xxx457f183a-9667-9b6d-cfb5-4c6a934f9e71"' has 3 fields

1 lists with 7 items (00.21% of keys, avg size 7.00)
485 hashs with 1455 fields (99.79% of keys, avg size 3.00)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

--memkeys:

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest hash   found so far '"xxx457f183a-9667-9b6d-cfb5-4c6a934f9e71"' with 216 bytes
[96.50%] Biggest list   found so far '"yyy"' with 7312 bytes

-------- summary -------

Sampled 486 keys in the keyspace!
Total key length in bytes is 27663 (avg len 56.92)

Biggest   list found '"yyy"' has 7312 bytes
Biggest   hash found '"xxx457f183a-9667-9b6d-cfb5-4c6a934f9e71"' has 216 bytes

1 lists with 7312 bytes (00.21% of keys, avg size 7312.00)
485 hashs with 98696 bytes (99.79% of keys, avg size 203.50)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 bytes (00.00% of keys, avg size 0.00)
0 sets with 0 bytes (00.00% of keys, avg size 0.00)
0 zsets with 0 bytes (00.00% of keys, avg size 0.

I also found something in SLOWLOG for the first time (however, if we look at the timestamp (and keys) of the log entries, they are not related to the timeout error, since they happened on a different day):

1) 1) (integer) 4
   2) (integer) 1659483831
   3) (integer) 12355
   4) 1) "HMGET"
      2) "xxxb5eeb393-59f4-6fe8-ddff-2c4e6d8196fe"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:49164"
   6) "ccc(SE.Redis-v2.6.48.48654)"
2) 1) (integer) 3
   2) (integer) 1659190396
   3) (integer) 13641
   4) 1) "EXPIRE"
      2) "xxxd1ac4cd0-fb88-3a35-fd21-d635ea07fd79"
      3) "28800"
   5) "zzz:49164"
   6) "ccc(SE.Redis-v2.6.48.48654)"
3) 1) (integer) 2
   2) (integer) 1658993073
   3) (integer) 16221
   4) 1) "HMGET"
      2) "xxx1aed4b2b-cf61-e361-8xb0-4d214713d1c3"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"
4) 1) (integer) 1
   2) (integer) 1658981542
   3) (integer) 15362
   4) 1) "HMGET"
      2) "xxx7a1f7b63-2b77-0cf2-0c6d-3a9904e3694a"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"
5) 1) (integer) 0
   2) (integer) 1658928730
   3) (integer) 14967
   4) 1) "HMGET"
      2) "xxx2b8e279e-d944-39d7-3ca1-5e4b70f699ee"
      3) "absexp"
      4) "sldexp"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"
127.0.0.1:6379> SLOWLOG GET
1) 1) (integer) 4
   2) (integer) 1659483831
   3) (integer) 12355
   4) 1) "HMGET"
      2) "xxxb5eeb393-5904-6fe8-ddfc-2c4e6d8196ae"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:49164"
   6) "ccc(SE.Redis-v2.6.48.48654)"
2) 1) (integer) 3
   2) (integer) 1659190396
   3) (integer) 13641
   4) 1) "EXPIRE"
      2) "xxxd1ac4cd0-fb88-3a35-f121-d635ea07fd79"
      3) "28800"
   5) "zzz:49164"
   6) "ccc(SE.Redis-v2.6.48.48654)"
3) 1) (integer) 2
   2) (integer) 1658993073
   3) (integer) 16221
   4) 1) "HMGET"
      2) "xxx1aed4b2b-cf61-e361-8610-4d214713d1c3"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"
4) 1) (integer) 1
   2) (integer) 1658981542
   3) (integer) 15362
   4) 1) "HMGET"
      2) "xxx7a1f7b63-2b77-0cf2-0c67-3a9901e3694a"
      3) "absexp"
      4) "sldexp"
      5) "data"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"
5) 1) (integer) 0
   2) (integer) 1658928730
   3) (integer) 14967
   4) 1) "HMGET"
      2) "xxx2b8e279e-d944-3907-3ca1-1e4b70f699ee"
      3) "absexp"
      4) "sldexp"
   5) "zzz:35942"
   6) "ccc(SE.Redis-v2.6.48.48654)"

hyperion-cs avatar Aug 04 '22 16:08 hyperion-cs

I see that your last error fail was for an EXPIRE command and your SLOWLOG includes two EXPIRE entries. That's all I'm able to see and say, there's not enough shown to be able to prove / disprove correlation between when the EXPIRE timeout error occurred and the SLOWLOG entries.

bebo-dot-dev avatar Aug 06 '22 14:08 bebo-dot-dev

Same issue for me, RedisTimeoutException, but this happend trying to get more than 20K of record.

I'm using a pagination strategy, with page size = 500, always stuck in 10000 records, and throw exception. I'm using v2.6.48 Redis StackExchange with last version of RedisOM in C#

image

icespedesc avatar Aug 07 '22 21:08 icespedesc

@icespedesc - what you're seeing is an issue in RediSearch that has been resolved. See my write up in redisearch/redisearch#2803

slorello89 avatar Aug 08 '22 04:08 slorello89

I was struggling with the same issue - I had to set ResolveDns to true. This was when connecting to Redis between apps on Fly.io with their app-name.internal endpoints.

wub avatar Aug 10 '22 05:08 wub

@wub, we connect to the Redis server by IP (without a domain name). In this case it is logical to assume that the DNS resolving options should not help in any way. However, thank you for the information.

hyperion-cs avatar Aug 10 '22 13:08 hyperion-cs

I'm going to close this out to tidy up since we actually have 4 or 5 distinctly different issues causing various timeout scenarios in here (all answered I think). If anyone landing on this has a specific issue they can't figure out, recommend opening an issue dedicated to that so we can advise!

NickCraver avatar Sep 07 '22 00:09 NickCraver