redisson
redisson copied to clipboard
Redission client after upgrade 3.17.4 causing lot of Timeouts issue
Hello, After upgrading redission client to 3.17.4 from 3.15.5, we are facing lot of timeout issues.
org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (1000 ms) occured after 0 retry attempts. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (GET), params: [accessPolicy.6JPYPCDDKN8F], channel: [id: 0x4cd3581a, L:/10.33.250.201:57000 - R:10.45.73.199/10.45.73.199:6379] at org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$5(RedisExecutor.java:341) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ... 6 truncated
if we rollback to previous version 3.15.5, we don't see the issues, please let us know if it's the known issue and how we can get rid of this issue.
Thanks, Pranay
We are having the same with RedissonExecutorService.submit. After upgrading from 3.15 to 3.17.4, the latency of submit task spiked from single digit ms to 1s.
here is our config
config.setNettyThreads(128)
.useClusterServers()
.addNodeAddress(String.format("rediss://%s:%s", CONTACT_POOL_EC_ADDRESS, CONTACT_POOL_EC_PORT))
.setTimeout(8000)
.setRetryInterval(50)
.setMasterConnectionPoolSize(256)
.setMasterConnectionMinimumIdleSize(16)
.setSlaveConnectionPoolSize(256)
.setSlaveConnectionMinimumIdleSize(16)
.setKeepAlive(true)
.setPingConnectionInterval(10000)
.setUsername( "default");
I met the same issue after upgrade to 3.17.4, below is the error log. thanks
RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (HSET), params: [XXXXXXXXXXX, PooledUnsafeDirectByteBuf(ridx: 0, widx: 24, cap: 256), PooledUnsafeDirectByteBuf(ridx: 0, widx: 544, cap: 1024)], channel: [id: 0x6ce9c160, L:/10.101.15.138:64890 - R:10.0.16.44/10.0.16.44:6379]
@yinyansheng Can you check which version exactly introduced this timeout for your application?
try to set checkLockSyncedSlaves = false with new version
@mrniko sorry, i write a wrong version,
my current version is 3.17.4, it still happens occasionally, serval times a week.
here is a error sample RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (EXISTS), params: [XXXXXX], channel: [id: 0x083772ea, L:/10.101.14.49:27095 - R:10.0.30.151/10.0.30.151:6379]
because my service has 200+ instances connect to the aws elasticcache(which contains 64 master nodes, 64 slave nodes),also redis calling is extremely high
is there any possible that there too much instances or too much redis nodes cause this issue?
i plan do three changes as below: 1.upgrage to latest version 2.change netty threads from 32 to 64 3.change idel timeout from 10000 to 30000
except checkLockSyncedSlaves = false , is there any other config i can try?
very very appreciate for you reply
@yinyansheng
Which Redisson objects do you use?
@yinyansheng
Which Redisson objects do you use?
mainly objects as below RMap<String, String> map = redissonClient.getMap(key); RDeque<String> deque = redissonClient.getDeque(key);
comands like:hset、hget、hgetall、lpush、rpop、llen the Map size is small, less than 10 usually, therefore hgetall command may not cost too much the others commands time complexity should be O(1)
also i found a scenario, although there are many error like "Redis server response timeout (3000 ms) occured after 0 retry attempts", but mostly retry success, because i only see few "Redis server response timeout (3000 ms) occured after 3 retry attempts", in a other words, does it mean that there are too much requests, and it may cause lack connections or thread blocking, and may cause the timeout issue, that is only my guess
now all my setting are default
by the way , i have a question , if i set keepalive true, does it means the connection pool will not release the idle connections?
thanks for your help
@yinyansheng
thread blocking with retryAttempts = 0 was fixed in 3.17.6 version.
thread blocking with retryAttempts = 0 was fixed in 3.17.6 version.
ok let me try do something changes as below, if any progress, i will give it back here 1.upgrade to latest version(3.17.6) 2.config change
`
config.useClusterServers()
.addNodeAddress(nodeAddressArray)
.setIdleConnectionTimeout(30_000)
.setKeepAlive(true);
config.setAddressResolverGroupFactory(new DnsAddressResolverGroupFactory())
.setNettyThreads(64)
.setCheckLockSyncedSlaves(false);
`
@yinyansheng
Any update?
@mrniko We are facing same issue after upgrading from 3.12.4 to 3.17.7 few hours ago, should we further increase from 5000 ms timeout, increased nettyThreads as well from 256 to 400, it reduced a bit, but didn't stop the errors.
"org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (5000 ms) occured after 0 retry attempts, is non-idempotent command: true Check connection with Redis node: 172.26.45.182/172.26.45.182:6379 for TCP packet drops. Try to increase nettyThreads and/or timeout settings. Command: (LPOP), params: [JobStopQueue172.26.134.232], channel: [id: 0x7103228e, L:/172.26.134.232:50494 - R:172.26.45.182/172.26.45.182:6379]\n\tat org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$8(RedisExecutor.java:387)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)\n\tat io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)\n\tat io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)\n\tat io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n",
@yinyansheng
Any update?
after changed the config, now still have a little few, but i think it is normal, with retries, i think mostly request will be executed successfully
if anybody have the similar issue in a in a mass of requests, can try to increase Netty thread size
@yinyansheng I increased nettyThreads from 256 to 400, it did little improvement, but I still see 4-5 errors per minute per application instance. @mrniko can you confirm if there is any functional impact like missing events for LPOP? We also retry one more time outside redission retryAttempts. So far I didn't hear any customer complaints after upgrade, but errors are creating lot of noise.
@rgampa
i think it is not normal, is there any possible server side issue? or can you put your config here.
@yinyansheng The errors spiked only after upgrade from 3.12.4 to 3.17.7, rolled back the version in one server, the errors went back to historical levels.
Here is the config:
Config config = new Config();
config.setCodec(StringCodec.INSTANCE);
config.setNettyThreads(400);
SentinelServersConfig serverConfig = config.useSentinelServers()
.setMasterName(redisClusterConfig.getSentinelMasterName())
.setScanInterval(5000)
.setReadMode(ReadMode.MASTER)
.setTimeout(5000)
.setMasterConnectionMinimumIdleSize(8)
.setMasterConnectionPoolSize(400)
.setSlaveConnectionMinimumIdleSize(8)
.setSlaveConnectionPoolSize(400)
.setKeepAlive(true)
.setPingConnectionInterval(30000)
.setRetryAttempts(3)
.setRetryInterval(1500)
.setSentinelAddresses(redisClusterConfig.getConnectionPoints());
I don't met this issue, and i am using redis cluster with 3.18.0. you can wait @mrniko 's response.
@rgampa
We are facing same issue after upgrading from 3.12.4 to 3.17.7 few hours ago
Default codec has changed to MarshallingCodec since 3.13.0 version. Did you try to use deprecated FstCodec? To check if it affects response handling performance in your case.
@mrniko we use config.setCodec(StringCodec.INSTANCE); , I have sent complete config above.
4 year ago we used Jedis and Lettuce before migrating to Redisson, due to compatibility issues using StringCodec.
@rgampa
Can you try 3.16.0 version? Which Redisson objects do you use?
@mrniko RBucket, RList, RMap, RMapCache, RAtomicLong, RRateLimiter, RLock, RQueue.
The issue is mostly observed with LPOP command on RQueue's poll().
I will try with version 3.16.0, but not anytime soon, it doesn't happen much in our QA environment, so hard to tell if the issue is resolved.
@rgampa
What is the object size returned by LPOP?
@mrniko sorry for the late reply, most messages are of size of 900 to 1000 bytes.
@xyqshi
We are having the same with RedissonExecutorService.submit. After upgrading from 3.15 to 3.17.4, the latency of submit task spiked from single digit ms to 1s.
here is our config
config.setNettyThreads(128) .useClusterServers() .addNodeAddress(String.format("rediss://%s:%s", CONTACT_POOL_EC_ADDRESS, CONTACT_POOL_EC_PORT)) .setTimeout(8000) .setRetryInterval(50) .setMasterConnectionPoolSize(256) .setMasterConnectionMinimumIdleSize(16) .setSlaveConnectionPoolSize(256) .setSlaveConnectionMinimumIdleSize(16) .setKeepAlive(true) .setPingConnectionInterval(10000) .setUsername( "default");
Unable to reproduce it with 3.18.0 version and test below:
RExecutorService e = redisson.getExecutorService("test");
long t = System.currentTimeMillis();
for (int i = 0; i < 5000; i++) {
e.submit(new RunnableTask());
}
assertThat(System.currentTimeMillis() - t).isLessThan(2000);
Thanks @mrniko for troubleshooting this, I will update this thread after upgrading to 3.18.0 and deploy to production which would happen in Jan 2023. BTW do you know as part of which issue this latency issue is resolved in 3.18.0?
@rgampa Hi,Have you solved the problem? I met the same problem when I update redisson verson to 3.17.7
Hi @fumanix not yet, I would know the result in Jan 2023 3rd or 4th week due to holiday peak traffic freeze on production deployments currently.
Due to this issue https://github.com/redisson/redisson/issues/4737 I would recommend you to wait for 3.19.0 version
Fixed
@mrniko Finally we deployed to production with Redisson upgrade 3.19.2. But the errors are still happening.
org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (5000 ms) occured after 0 retry attempts, is non-idempotent command: true Check connection with Redis node: 10.42.xx.xx/10.42.xx.xx:6379 for TCP packet drops. Try to increase nettyThreads and/or timeout settings. Command: (LPOP), params: [queue], channel: [id: 0xcb755609, L:/172.17.xx.xx:40046 - R:10.42.xx.xx/10.42.xx.xx:6379]\n\tat org.redisson.command.RedisExecutor.lambda$scheduleResponseTimeout$9(RedisExecutor.java:402)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)\n\tat io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)\n\tat io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)\n\tat io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n
@rgampa
Can you share Redisson config?