redis-rb Issue in redis timeout, commands get executed twice.

Reproducible with redis (3.3.1), on Centos 6/7.
TCP connection.

Issue overview I found a redis command 'zremrangebyscore' return a wrong count (0) of removed items even thought it had purged items. The deleted zset is of huge range which tooks more than the default timeout (5sec). Slow query log pointed out as around 9sec. Redis client didn't give a TimeOut error, but rather it gave a wrong count as '0' because there is a execution of another duplicate query.

I can understand there might be a boundary line on identifying if it has timed-out or not. Would be helpful to understand this more on this.

Observations

Please find the script gist, to reproduce the issue. This will show the duplicate queries in the monitored commands. Might have to adjust the timeout based on the machine it is run.
- Run 'ruby redis_pump.rb pump' to create dataset
- Run 'ruby redis_pump.rb test' to simulate the issue.
TimeOut errors - Could see the duplicate query even when timeout errors happens. Could pass the 'TEST_TIMEOUT=' to a lower value for the above script to see the behaviour.
I spend some time to understand on what is causing the issue from the code. I couldn't simulate the issue with the tcp write_non_block. Is the duplicate query caused by the timeout from the read on IO.select ?. Any details would be helpful.

Jan 09 '17 16:01 alexnavis

This sounds like the exact same issue Sidekiq users recently notices. Commands are retried up to reconnect_attempts, as can be configured on context creation (see initialize). I have to admit it can be kinda unexpected.

Can you retry your test with :reconnect_attempts => 0? Does the mentioned problem still occur?

Jan 09 '17 17:01 badboy

Thanks @badboy and sorry for the delay in my reply. I have tested the same with reconnect_attempts: 0, it solves the problem. It throws out the exception since the read socket timeouts the first time and no retry is happening from client.rb (thanks for pointing this out, it helped to understand what was happening).

Question:

Client.io method is able to differentiate between a timeout and connection error which ensures the connection is fine while creating. (Errno::ECONNRESET, Errno::EPIPE, Errno::ECONNABORTED, Errno::EBADF, Errno::EINVAL)
- Should the TimeoutError be considered as BaseConnectionError for the retry logic ?.
- Will it be better not to retry for this case ?

Thanks for the help.

Jan 11 '17 10:01 alexnavis

I'd have to look in the code on whether we could handle this differently. I'll keep this open to eventually get around to take a look.

Jan 23 '17 16:01 badboy

@badboy Where in the code the command gets retried? I don't quite see it. I see ensure_connected retrying connect.

I've hit the same problem and trying to understand what is happening

Upd: never mind, missed yield inside ensure_connected

Mar 17 '19 02:03 timanovsky

Closing since it's somewhat expected with the default reconnect_attemps: 1. I'm not a huge fan of this default, and I may change in in 5.0 (not sure) but in the meantime this is documented behavior.

Aug 17 '22 19:08 byroot

redis-rb redis-rb copied to clipboard

Issue in redis timeout, commands get executed twice.

redis-rb
redis-rb copied to clipboard