redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

process hangs due to default socket_timeout=None

Open mschfh opened this issue 3 years ago • 5 comments

Version: master (https://github.com/redis/redis-py/tree/bea72995fd39b01e2f0a1682b16b6c7690933f36)

Platform: Python 3.10 on Ubuntu 20.04

Description:

There is no default socket_timeout set for connections, which could lead to outages. (only tested RedisCluster, but the connection object seems to be shared with the single-node implementation)

steps to reproduce

  1. Example code:
from redis.cluster import RedisCluster as Redis
from time import sleep

rc = Redis(host="172.20.1.1", port=6379)

for x in range(10000):
    print(f"{x} : {rc.cluster_keyslot(x)}")
    rc.get(x)
    sleep(2)
  1. isolate the node hosting the keyslot, ensure there is a TCP timeout instead of closing the connection, for example:
    • add an iptables DROP rule, or
    • docker network disconnect NETWORK_ID CONTAINER_ID (stopping/killing the container does NOT work, as the host might immediately generate a no route to host response)

expected behaviour

redis-py throws a timeout

actual behaviour:

the process hangs

relevant Stack trace (gdb)

The stack trace shows socket_timeout=None

#12 Frame 0x558935950720, for file /workspace/redis-cluster-issue/redis-py/redis/connection.py, line 192, in _read_from_socket (self=<SocketBuffer(_sock=<socket at remote 0x7f01866520e0>, socket_read_size=65536, socket_timeout=None, _buffer=<_io.BytesIO at remote 0x7f0186bd4f90>, bytes_written=0, bytes_read=0) at remote 0x7f01863382e0>, length=None, timeout=<object at remote 0x7f0187458df0>, raise_on_timeout=True, sock=<...>, socket_read_size=65536, buf=<_io.BytesIO at remote 0x7f0186bd4f90>, marker=0, custom_timeout=False)
    data = self._sock.recv(socket_read_size)

#32 Frame 0x7f018667d970, for file /workspace/redis-cluster-issue/redis-py/redis/connection.py, line 824, in read_response (self=<Connection(pid=77375, host='172.20.2.2', port=6379, db=0, username=None, client_name=None, password=None, socket_timeout=None, socket_connect_timeout=None, socket_keepalive=None, socket_keepalive_options={}, socket_type=0, retry_on_timeout=False, retry_on_error=[], retry=<Retry(_backoff=<NoBackoff(_backoff=0) at remote 0x7f018662b700>, _retries=0, _supported_errors=(<type at remote 0x558935865b00>, <type at remote 0x558935866680>, <type at remote 0x5589343d5380>)) at remote 0x7f018662b7c0>, health_check_interval=0, next_health_check=0, redis_connect_func=<method at remote 0x7f018736ca00>, encoder=<Encoder(encoding='utf-8', encoding_errors='strict', decode_responses=False) at remote 0x7f0186338160>, _sock=<socket at remote 0x7f01866520e0>, _socket_read_size=65536, _parser=<ClusterParser(socket_read_size=65536, encoder=<...>, _sock=<...>, _buffer=<SocketBuffer(_sock=<...>, socket_read_si...(truncated)
    response = self._parser.read_response(disable_decoding=disable_decoding)

suggested fixes

  • enable keepalive per default
  • add a sane default timeout (might interfere with BRPOP / BLPOP -> increase the socket timeout dynamically if those are used?)
  • improve documentation

additional info

https://docs.python.org/3/library/socket.html#socket.socket.settimeout

(There is a related issue (#722) outlining some workarounds, I submitted this one to discuss changing the defaults to more sensible values)

mschfh avatar Jun 22 '22 06:06 mschfh