rb icon indicating copy to clipboard operation
rb copied to clipboard

When a redis server drops an already established connection, it fails loudly

Open mattrobenolt opened this issue 9 years ago • 9 comments

This can be tested by setting CONFIG SET timeout 1 in Redis, then running:

from time import sleep
from rb import Cluster

cluster = Cluster({0:{'host': '127.0.0.1', 'port': 6379}})

with cluster.map() as c:
    c.get('foo')

sleep(2)

with cluster.map() as c:
    c.get('foo')

Yells with redis.exceptions.ConnectionError: Error while trying to write requests to redis..

The problem is the first poll on the socket is a close event before any data has been written to the socket.

We should detect this and attempt to reconnect.

mattrobenolt avatar Apr 22 '16 00:04 mattrobenolt

On Linux, this manifests slightly differently and fails in a different place:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 577, in __exit__
    self.mapping_client.join(timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 366, in join
    command_buffer.wait_for_responses(self)
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 208, in wait_for_responses
    self.connection, command_name)
  File "/usr/local/lib/python2.7/site-packages/redis/client.py", line 585, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 577, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 238, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 168, in readline
    self._read_from_socket()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 143, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

mattrobenolt avatar Apr 23 '16 22:04 mattrobenolt

In our case this redis connection drop apparently causes a 500 error posting one of the events to the API. Any idea why the connection drops?

Jan 20 11:02:29 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:29 +0000] "POST /api/22/store/ HTTP/1.1" 200 366 "-" "raven-python/5.4.0.dev0" Jan 20 11:02:30 xxx sentry: Traceback (most recent call last): Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 102, in dispatch Jan 20 11:02:30 xxx sentry: *args, **kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 225, in _dispatch Jan 20 11:02:30 xxx sentry: **kwargs Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/../django/views/generic/base.py", line 87, in dispatch Jan 20 11:02:30 xxx sentry: return handler(request, *args, **kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 286, in post Jan 20 11:02:30 xxx sentry: response_or_event_id = self.process(request, data=data, **kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 369, in process Jan 20 11:02:30 xxx sentry: (app.tsdb.models.organization_total_received, project.organization_id), Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/tsdb/redis.py", line 174, in incr_multi Jan 20 11:02:30 xxx sentry: self.calculate_expiry(rollup, max_values, timestamp), Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 623, in exit Jan 20 11:02:30 xxx sentry: self.mapping_client.join(timeout=timeout) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 412, in join Jan 20 11:02:30 xxx sentry: command_buffer.wait_for_responses(self) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 226, in wait_for_responses Jan 20 11:02:30 xxx sentry: self.connection, command_name, **options) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/client.py", line 585, in parse_response Jan 20 11:02:30 xxx sentry: response = connection.read_response() Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/connection.py", line 577, in read_response Jan 20 11:02:30 xxx sentry: response = self._parser.read_response() Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/connection.py", line 357, in read_response Jan 20 11:02:30 xxx sentry: (e.args,)) Jan 20 11:02:30 xxx sentry: ConnectionError: Error while reading from socket: ('Connection closed by server.',) Jan 20 11:02:30 xxx sentry: 19:02:30 [ERROR] sentry: Error while reading from socket: ('Connection closed by server.',) Jan 20 11:02:30 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:30 +0000] "POST /api/22/store/ HTTP/1.1" 200 366 "-" "raven-python/5.4.0.dev0" Jan 20 11:02:30 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:30 +0000] "POST /api/2/store/ HTTP/1.1" 500 337 "-" "raven-python/5.4.0.dev0"

benh57 avatar Jan 20 '17 19:01 benh57

Many reasons. Network layer could be flaky, etc. In our case, it was very specifically while using Docker's networking stuff. Using --net=host on everything worked around the issue since we didn't need the extra networking abstraction and provided us no value.

mattrobenolt avatar Jan 20 '17 19:01 mattrobenolt

Just hit this as well. Did anyone figure out a cause or a fix?

ffledgling avatar Oct 26 '18 20:10 ffledgling

Having the same issue on K8s with redis-ha fresh chart with HAProxy enabled

Antiarchitect avatar Dec 21 '19 16:12 Antiarchitect

Hi, @Antiarchitect I have same issue,how do you solve it?

linux0x5c avatar Jan 14 '20 01:01 linux0x5c

Hi @linux0x5c actually as I have isolated k8s environment I have in redis.conf

timeout 0

in haproxy.cfg

timeout server 0
timeout client 0

These settings drastically reduced the amount of failing connections but considered bad practice in an open environment. Actually I do not know why timeout 10m is not working as it is two times bigger than standard Redis keepalive 300s.

Hope this helps.

Antiarchitect avatar Jan 14 '20 14:01 Antiarchitect

@Antiarchitect ,OK,thanks for your help,I will try it.

linux0x5c avatar Jan 15 '20 03:01 linux0x5c

Setting timeout 0 didn't fixed this issue for me

kimxogus avatar Jun 29 '20 06:06 kimxogus