redis-py
redis-py copied to clipboard
process hangs due to default socket_timeout=None
Version: master (https://github.com/redis/redis-py/tree/bea72995fd39b01e2f0a1682b16b6c7690933f36)
Platform: Python 3.10 on Ubuntu 20.04
Description:
There is no default socket_timeout set for connections, which could lead to outages. (only tested RedisCluster, but the connection object seems to be shared with the single-node implementation)
steps to reproduce
- Example code:
from redis.cluster import RedisCluster as Redis
from time import sleep
rc = Redis(host="172.20.1.1", port=6379)
for x in range(10000):
print(f"{x} : {rc.cluster_keyslot(x)}")
rc.get(x)
sleep(2)
- isolate the node hosting the keyslot, ensure there is a TCP timeout instead of closing the connection, for example:
- add an
iptablesDROP rule, or docker network disconnect NETWORK_ID CONTAINER_ID(stopping/killing the container does NOT work, as the host might immediately generate ano route to hostresponse)
- add an
expected behaviour
redis-py throws a timeout
actual behaviour:
the process hangs
relevant Stack trace (gdb)
The stack trace shows socket_timeout=None
#12 Frame 0x558935950720, for file /workspace/redis-cluster-issue/redis-py/redis/connection.py, line 192, in _read_from_socket (self=<SocketBuffer(_sock=<socket at remote 0x7f01866520e0>, socket_read_size=65536, socket_timeout=None, _buffer=<_io.BytesIO at remote 0x7f0186bd4f90>, bytes_written=0, bytes_read=0) at remote 0x7f01863382e0>, length=None, timeout=<object at remote 0x7f0187458df0>, raise_on_timeout=True, sock=<...>, socket_read_size=65536, buf=<_io.BytesIO at remote 0x7f0186bd4f90>, marker=0, custom_timeout=False)
data = self._sock.recv(socket_read_size)
#32 Frame 0x7f018667d970, for file /workspace/redis-cluster-issue/redis-py/redis/connection.py, line 824, in read_response (self=<Connection(pid=77375, host='172.20.2.2', port=6379, db=0, username=None, client_name=None, password=None, socket_timeout=None, socket_connect_timeout=None, socket_keepalive=None, socket_keepalive_options={}, socket_type=0, retry_on_timeout=False, retry_on_error=[], retry=<Retry(_backoff=<NoBackoff(_backoff=0) at remote 0x7f018662b700>, _retries=0, _supported_errors=(<type at remote 0x558935865b00>, <type at remote 0x558935866680>, <type at remote 0x5589343d5380>)) at remote 0x7f018662b7c0>, health_check_interval=0, next_health_check=0, redis_connect_func=<method at remote 0x7f018736ca00>, encoder=<Encoder(encoding='utf-8', encoding_errors='strict', decode_responses=False) at remote 0x7f0186338160>, _sock=<socket at remote 0x7f01866520e0>, _socket_read_size=65536, _parser=<ClusterParser(socket_read_size=65536, encoder=<...>, _sock=<...>, _buffer=<SocketBuffer(_sock=<...>, socket_read_si...(truncated)
response = self._parser.read_response(disable_decoding=disable_decoding)
suggested fixes
- enable keepalive per default
- add a sane default timeout (might interfere with
BRPOP/BLPOP-> increase the socket timeout dynamically if those are used?) - improve documentation
additional info
https://docs.python.org/3/library/socket.html#socket.socket.settimeout
(There is a related issue (#722) outlining some workarounds, I submitted this one to discuss changing the defaults to more sensible values)