redisson icon indicating copy to clipboard operation
redisson copied to clipboard

Redission client after upgrade 3.17.4 causing lot of Timeouts issue

Open pranayd26 opened this issue 3 years ago • 9 comments

Hello, After upgrading redission client to 3.17.4 from 3.15.5, we are facing lot of timeout issues.

org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (1000 ms) occured after 0 retry attempts. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (GET), params: [accessPolicy.6JPYPCDDKN8F], channel: [id: 0x4cd3581a, L:/10.33.250.201:57000 - R:10.45.73.199/10.45.73.199:6379] at org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$5(RedisExecutor.java:341) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ... 6 truncated

if we rollback to previous version 3.15.5, we don't see the issues, please let us know if it's the known issue and how we can get rid of this issue.

Thanks, Pranay

pranayd26 avatar Jun 27 '22 12:06 pranayd26

We are having the same with RedissonExecutorService.submit. After upgrading from 3.15 to 3.17.4, the latency of submit task spiked from single digit ms to 1s.

here is our config

            config.setNettyThreads(128)
                  .useClusterServers()
                  .addNodeAddress(String.format("rediss://%s:%s", CONTACT_POOL_EC_ADDRESS, CONTACT_POOL_EC_PORT))
                  .setTimeout(8000)
                  .setRetryInterval(50)
                  .setMasterConnectionPoolSize(256)
                  .setMasterConnectionMinimumIdleSize(16)
                  .setSlaveConnectionPoolSize(256)
                  .setSlaveConnectionMinimumIdleSize(16)
                  .setKeepAlive(true)
                  .setPingConnectionInterval(10000)
                  .setUsername( "default");

xyqshi avatar Jun 29 '22 22:06 xyqshi

I met the same issue after upgrade to 3.17.4, below is the error log. thanks

RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (HSET), params: [XXXXXXXXXXX, PooledUnsafeDirectByteBuf(ridx: 0, widx: 24, cap: 256), PooledUnsafeDirectByteBuf(ridx: 0, widx: 544, cap: 1024)], channel: [id: 0x6ce9c160, L:/10.101.15.138:64890 - R:10.0.16.44/10.0.16.44:6379]

yinyansheng avatar Sep 16 '22 05:09 yinyansheng

@yinyansheng Can you check which version exactly introduced this timeout for your application?

mrniko avatar Sep 16 '22 05:09 mrniko

try to set checkLockSyncedSlaves = false with new version

mrniko avatar Sep 16 '22 05:09 mrniko

@mrniko sorry, i write a wrong version,
my current version is 3.17.4, it still happens occasionally, serval times a week.

here is a error sample RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (EXISTS), params: [XXXXXX], channel: [id: 0x083772ea, L:/10.101.14.49:27095 - R:10.0.30.151/10.0.30.151:6379]

because my service has 200+ instances connect to the aws elasticcache(which contains 64 master nodes, 64 slave nodes),also redis calling is extremely high

is there any possible that there too much instances or too much redis nodes cause this issue?

i plan do three changes as below: 1.upgrage to latest version 2.change netty threads from 32 to 64 3.change idel timeout from 10000 to 30000

except checkLockSyncedSlaves = false , is there any other config i can try?

very very appreciate for you reply

yinyansheng avatar Sep 18 '22 05:09 yinyansheng

@yinyansheng

Which Redisson objects do you use?

mrniko avatar Sep 18 '22 07:09 mrniko

@yinyansheng

Which Redisson objects do you use?

mainly objects as below RMap<String, String> map = redissonClient.getMap(key); RDeque<String> deque = redissonClient.getDeque(key);

comands like:hset、hget、hgetall、lpush、rpop、llen the Map size is small, less than 10 usually, therefore hgetall command may not cost too much the others commands time complexity should be O(1)

also i found a scenario, although there are many error like "Redis server response timeout (3000 ms) occured after 0 retry attempts", but mostly retry success, because i only see few "Redis server response timeout (3000 ms) occured after 3 retry attempts", in a other words, does it mean that there are too much requests, and it may cause lack connections or thread blocking, and may cause the timeout issue, that is only my guess

now all my setting are default

by the way , i have a question , if i set keepalive true, does it means the connection pool will not release the idle connections?

thanks for your help

yinyansheng avatar Sep 18 '22 12:09 yinyansheng

@yinyansheng

thread blocking with retryAttempts = 0 was fixed in 3.17.6 version.

mrniko avatar Sep 19 '22 06:09 mrniko

thread blocking with retryAttempts = 0 was fixed in 3.17.6 version.

ok let me try do something changes as below, if any progress, i will give it back here 1.upgrade to latest version(3.17.6) 2.config change

`

   config.useClusterServers()
            .addNodeAddress(nodeAddressArray)
            .setIdleConnectionTimeout(30_000)
            .setKeepAlive(true);

    config.setAddressResolverGroupFactory(new DnsAddressResolverGroupFactory())
            .setNettyThreads(64)
            .setCheckLockSyncedSlaves(false);

`

yinyansheng avatar Sep 19 '22 08:09 yinyansheng

@yinyansheng

Any update?

mrniko avatar Nov 17 '22 08:11 mrniko

@mrniko We are facing same issue after upgrading from 3.12.4 to 3.17.7 few hours ago, should we further increase from 5000 ms timeout, increased nettyThreads as well from 256 to 400, it reduced a bit, but didn't stop the errors.

"org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (5000 ms) occured after 0 retry attempts, is non-idempotent command: true Check connection with Redis node: 172.26.45.182/172.26.45.182:6379 for TCP packet drops. Try to increase nettyThreads and/or timeout settings. Command: (LPOP), params: [JobStopQueue172.26.134.232], channel: [id: 0x7103228e, L:/172.26.134.232:50494 - R:172.26.45.182/172.26.45.182:6379]\n\tat org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$8(RedisExecutor.java:387)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)\n\tat io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)\n\tat io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)\n\tat io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n",

rgampa avatar Nov 19 '22 21:11 rgampa

@yinyansheng

Any update?

after changed the config, now still have a little few, but i think it is normal, with retries, i think mostly request will be executed successfully

if anybody have the similar issue in a in a mass of requests, can try to increase Netty thread size

yinyansheng avatar Nov 22 '22 03:11 yinyansheng

@yinyansheng I increased nettyThreads from 256 to 400, it did little improvement, but I still see 4-5 errors per minute per application instance. @mrniko can you confirm if there is any functional impact like missing events for LPOP? We also retry one more time outside redission retryAttempts. So far I didn't hear any customer complaints after upgrade, but errors are creating lot of noise.

rgampa avatar Nov 22 '22 03:11 rgampa

@rgampa

i think it is not normal, is there any possible server side issue? or can you put your config here.

yinyansheng avatar Nov 22 '22 03:11 yinyansheng

@yinyansheng The errors spiked only after upgrade from 3.12.4 to 3.17.7, rolled back the version in one server, the errors went back to historical levels.

Here is the config:

Config config = new Config();
config.setCodec(StringCodec.INSTANCE);
config.setNettyThreads(400);

SentinelServersConfig serverConfig = config.useSentinelServers()
			.setMasterName(redisClusterConfig.getSentinelMasterName())
			.setScanInterval(5000)
			.setReadMode(ReadMode.MASTER)
			.setTimeout(5000) 
			.setMasterConnectionMinimumIdleSize(8)
			.setMasterConnectionPoolSize(400)
			.setSlaveConnectionMinimumIdleSize(8)
			.setSlaveConnectionPoolSize(400)
			.setKeepAlive(true)
			.setPingConnectionInterval(30000)
			.setRetryAttempts(3)
			.setRetryInterval(1500)
                         .setSentinelAddresses(redisClusterConfig.getConnectionPoints());

rgampa avatar Nov 22 '22 03:11 rgampa

I don't met this issue, and i am using redis cluster with 3.18.0. you can wait @mrniko 's response.

yinyansheng avatar Nov 22 '22 03:11 yinyansheng

@rgampa

We are facing same issue after upgrading from 3.12.4 to 3.17.7 few hours ago

Default codec has changed to MarshallingCodec since 3.13.0 version. Did you try to use deprecated FstCodec? To check if it affects response handling performance in your case.

mrniko avatar Nov 22 '22 10:11 mrniko

@mrniko we use config.setCodec(StringCodec.INSTANCE); , I have sent complete config above.

4 year ago we used Jedis and Lettuce before migrating to Redisson, due to compatibility issues using StringCodec.

rgampa avatar Nov 22 '22 18:11 rgampa

@rgampa

Can you try 3.16.0 version? Which Redisson objects do you use?

mrniko avatar Nov 23 '22 07:11 mrniko

@mrniko RBucket, RList, RMap, RMapCache, RAtomicLong, RRateLimiter, RLock, RQueue. The issue is mostly observed with LPOP command on RQueue's poll().
I will try with version 3.16.0, but not anytime soon, it doesn't happen much in our QA environment, so hard to tell if the issue is resolved.

rgampa avatar Nov 23 '22 07:11 rgampa

@rgampa

What is the object size returned by LPOP?

mrniko avatar Nov 23 '22 08:11 mrniko

@mrniko sorry for the late reply, most messages are of size of 900 to 1000 bytes.

rgampa avatar Nov 28 '22 18:11 rgampa

@xyqshi

We are having the same with RedissonExecutorService.submit. After upgrading from 3.15 to 3.17.4, the latency of submit task spiked from single digit ms to 1s.

here is our config

            config.setNettyThreads(128)
                  .useClusterServers()
                  .addNodeAddress(String.format("rediss://%s:%s", CONTACT_POOL_EC_ADDRESS, CONTACT_POOL_EC_PORT))
                  .setTimeout(8000)
                  .setRetryInterval(50)
                  .setMasterConnectionPoolSize(256)
                  .setMasterConnectionMinimumIdleSize(16)
                  .setSlaveConnectionPoolSize(256)
                  .setSlaveConnectionMinimumIdleSize(16)
                  .setKeepAlive(true)
                  .setPingConnectionInterval(10000)
                  .setUsername( "default");

Unable to reproduce it with 3.18.0 version and test below:

        RExecutorService e = redisson.getExecutorService("test");
        long t = System.currentTimeMillis();
        for (int i = 0; i < 5000; i++) {
            e.submit(new RunnableTask());
        }
        assertThat(System.currentTimeMillis() - t).isLessThan(2000);

mrniko avatar Nov 29 '22 08:11 mrniko

Thanks @mrniko for troubleshooting this, I will update this thread after upgrading to 3.18.0 and deploy to production which would happen in Jan 2023. BTW do you know as part of which issue this latency issue is resolved in 3.18.0?

rgampa avatar Nov 29 '22 18:11 rgampa

@rgampa Hi,Have you solved the problem? I met the same problem when I update redisson verson to 3.17.7

fumanix avatar Dec 16 '22 03:12 fumanix

Hi @fumanix not yet, I would know the result in Jan 2023 3rd or 4th week due to holiday peak traffic freeze on production deployments currently.

rgampa avatar Dec 16 '22 04:12 rgampa

Due to this issue https://github.com/redisson/redisson/issues/4737 I would recommend you to wait for 3.19.0 version

mrniko avatar Dec 16 '22 06:12 mrniko

Fixed

mrniko avatar Jan 30 '23 08:01 mrniko

@mrniko Finally we deployed to production with Redisson upgrade 3.19.2. But the errors are still happening.

org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (5000 ms) occured after 0 retry attempts, is non-idempotent command: true Check connection with Redis node: 10.42.xx.xx/10.42.xx.xx:6379 for TCP packet drops.  Try to increase nettyThreads and/or timeout settings. Command: (LPOP), params: [queue], channel: [id: 0xcb755609, L:/172.17.xx.xx:40046 - R:10.42.xx.xx/10.42.xx.xx:6379]\n\tat org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$9(RedisExecutor.java:402)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)\n\tat io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)\n\tat io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)\n\tat io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n

rgampa avatar Mar 19 '23 18:03 rgampa

@rgampa

Can you share Redisson config?

mrniko avatar Mar 20 '23 10:03 mrniko