redis-rb
redis-rb copied to clipboard
Issue in redis timeout, commands get executed twice.
- Reproducible with redis (3.3.1), on Centos 6/7.
- TCP connection.
Issue overview I found a redis command 'zremrangebyscore' return a wrong count (0) of removed items even thought it had purged items. The deleted zset is of huge range which tooks more than the default timeout (5sec). Slow query log pointed out as around 9sec. Redis client didn't give a TimeOut error, but rather it gave a wrong count as '0' because there is a execution of another duplicate query.
I can understand there might be a boundary line on identifying if it has timed-out or not. Would be helpful to understand this more on this.
Observations
-
Please find the script gist, to reproduce the issue. This will show the duplicate queries in the monitored commands. Might have to adjust the timeout based on the machine it is run.
- Run 'ruby redis_pump.rb pump' to create dataset
- Run 'ruby redis_pump.rb test' to simulate the issue.
-
TimeOut errors - Could see the duplicate query even when timeout errors happens. Could pass the 'TEST_TIMEOUT=' to a lower value for the above script to see the behaviour.
-
I spend some time to understand on what is causing the issue from the code. I couldn't simulate the issue with the tcp write_non_block. Is the duplicate query caused by the timeout from the read on IO.select ?. Any details would be helpful.
This sounds like the exact same issue Sidekiq users recently notices.
Commands are retried up to reconnect_attempts, as can be configured on context creation (see initialize).
I have to admit it can be kinda unexpected.
Can you retry your test with :reconnect_attempts => 0? Does the mentioned problem still occur?
Thanks @badboy and sorry for the delay in my reply. I have tested the same with reconnect_attempts: 0, it solves the problem. It throws out the exception since the read socket timeouts the first time and no retry is happening from client.rb (thanks for pointing this out, it helped to understand what was happening).
Question:
-
Client.io method is able to differentiate between a timeout and connection error which ensures the connection is fine while creating. (Errno::ECONNRESET, Errno::EPIPE, Errno::ECONNABORTED, Errno::EBADF, Errno::EINVAL)
- Should the TimeoutError be considered as BaseConnectionError for the retry logic ?.
- Will it be better not to retry for this case ?
Thanks for the help.
I'd have to look in the code on whether we could handle this differently. I'll keep this open to eventually get around to take a look.
@badboy Where in the code the command gets retried? I don't quite see it. I see ensure_connected retrying connect.
I've hit the same problem and trying to understand what is happening
Upd: never mind, missed yield inside ensure_connected
Closing since it's somewhat expected with the default reconnect_attemps: 1. I'm not a huge fan of this default, and I may change in in 5.0 (not sure) but in the meantime this is documented behavior.