aioredis-py icon indicating copy to clipboard operation
aioredis-py copied to clipboard

ConnectionResetError: [Errno 104] Connection reset by peer

Open zzlpeter opened this issue 4 years ago • 58 comments

Hi, everybody! I use torando+aioredis, and recently i met this issue, below is traceback my environ: aioredis==1.2.0 tornado==5.1.1 I use this method aioredis.create_redis_pool(**args) to create pool can anybody show me help? thx a lot.

`Traceback (most recent call last):
   File "/usr/local/lib/python3.6/site-packages/tornado/web.py", line 1699, in _execute
 result = await result
 File "/views/notice.py", line 341, in get
   items, page_size, total_page, total_size = await Notice.cache_or_api_list(notice_id_list, page_count, page_size)
 File "models/notice.py", line 136, in cache_or_api_list
   items = await cls.query_list(page_list)
 File "models/notice.py", line 92, in query_list
   items = await asyncio.gather(*[Notice.cache_or_api(notice_id) for notice_id in notice_id_list])
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 File "models/notice.py", line 37, in cache_or_api
   info = await redis.execute('get', redis_key)
 [Previous line repeated 11 more times]
 File "/usr/local/lib/python3.6/site-packages/aioredis/connection.py", line 183, in _read_data
   obj = await self._reader.readobj()
 File "/usr/local/lib/python3.6/site-packages/aioredis/stream.py", line 94, in readobj
   await self._wait_for_data('readobj')
 File "/usr/local/lib/python3.6/asyncio/streams.py", line 464, in _wait_for_data
   yield from self._waiter
 File "/usr/local/lib/python3.6/asyncio/selector_events.py", line 723, in _read_ready
   data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer`

zzlpeter avatar Jul 16 '20 08:07 zzlpeter

i had the same ConnectionResetError

CL545740896 avatar Sep 09 '20 02:09 CL545740896

Can you please check out the latest master and test with that? Note that as of #891 the client has the same API as redis-py.

seandstewart avatar Mar 19 '21 00:03 seandstewart

getting a few thousand of these a day when using Django Channels tested with Python 3.8.1 and Python 3.9.2 using aioredis 1.3.1 installed as child of channels

djstein avatar Mar 31 '21 18:03 djstein

@djstein Are you getting this on production? If you're experiencing this in development, please refer to #930 for migration needs as v1 major version will not get any fixes.

Andrew-Chen-Wang avatar Mar 31 '21 19:03 Andrew-Chen-Wang

This error still exists in aioredis==2.0.0 in production.

rushilsrivastava avatar Aug 11 '21 06:08 rushilsrivastava

i had the same ConnectionResetError

RonaldinhoL avatar Sep 06 '21 14:09 RonaldinhoL

from my test, when this occured, aioredis / redis-py / asyncio-redis cann't connect, but aredis can do, what is the difference in it?

RonaldinhoL avatar Sep 08 '21 07:09 RonaldinhoL

Hi all, the problem lies with the ConnectionError type. aioredis implements its own ConnectionError(builtins.ConnectionError, RedisError which causes problems because ConnectionResetError is no longer a subclass of ConnectionError. If one looks at line 37 of client.py, it overwrites ConnectionError so that the exception can no longer be caught on line 1067....

eneloop2 avatar Sep 09 '21 15:09 eneloop2

Any workaround found for this issue ?

shaakaud avatar Nov 04 '21 01:11 shaakaud

Same issue here in production. There's a firewall resetting connections after some time, and aioredis won't recover from this, which is a serious problem. Is there any known workaround?

Enchufa2 avatar Dec 02 '21 10:12 Enchufa2

Please, consider this issue in #1225.

Enchufa2 avatar Dec 02 '21 11:12 Enchufa2

This issue can be closed, it should've been solved with #1129.

rushilsrivastava avatar Dec 02 '21 11:12 rushilsrivastava

No, sorry, it doesn't solve this issue. Quick test:

  1. Run a Redis instance locally:
$ docker run --rm -d -p 6379:6379 --name redis redis
  1. Run a simple test.py program that continually reads something from Redis. For example (ugly, but easy to translate to redis):
from aioredis import Redis
import asyncio, time

redis = Redis()
loop = asyncio.get_event_loop()
loop.run_until_complete(redis.set("something", "something"))

while True:
  print(loop.run_until_complete(redis.get("something")))
  time.sleep(5)
  1. Kill the connection (technically this is a connection abort, not a connection reset, but it has the same effect and unveils the same underlying issue):
$ sudo ss -K dst [::1] dport = redis
  • The equivalent test program using redis instead of aioredis automagically recovers from the loss of connection and keeps going without error.

  • aioredis 2.0.0 gives:

Traceback (most recent call last):
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
    await asyncio.wait_for(
  File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
    await self._writer.drain()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
    await conn.send_command(*args)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
    await self.send_packed_command(
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
    raise ConnectionError(
aioredis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.
  • aioredis from master, i.e. with #1129:
Traceback (most recent call last):
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 762, in disconnect
    await self._writer.wait_closed()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 359, in wait_closed
    await self._protocol._get_close_waiter(self)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
    await asyncio.wait_for(
  File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
    await self._writer.drain()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 375, in drain
    raise exc
  File "/usr/lib64/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionAbortedError: [Errno 103] Software caused connection abort

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
    await conn.send_command(*args)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
    await self.send_packed_command(
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
    raise ConnectionError(
aioredis.exceptions.ConnectionError: Error 103 while writing to socket. Software caused connection abort.

So we switched error messages, but the issue persists.

Enchufa2 avatar Dec 02 '21 12:12 Enchufa2

A workaround would be to call redis.connection_pool.disconnect() before performing any operation before/after a long pause where a reset may happen.

Enchufa2 avatar Dec 02 '21 14:12 Enchufa2

@Enchufa2 thanks for running the test. What do you mean automagically? Is there code that they implement that we somehow missed or code that we did port that is not functioning properly? Does internal connection not handle this properly?

Andrew-Chen-Wang avatar Dec 02 '21 15:12 Andrew-Chen-Wang

I mean that redis somehow figures out that the connection is broken, disposes of it and opens a new one instead of failing. I'm not sure how redis does this and therefore what's missing here, because unfortunately I'm not familiar with either codebases. But redis's behaviour is what I would expect from a connection pool. Otherwise, one needs to think about whether there are connections and whether they are alive, which is contrary to the very abstraction of a connection pool, right?

Enchufa2 avatar Dec 02 '21 15:12 Enchufa2

@Enchufa2 can you change this:

https://github.com/aio-libs/aioredis-py/blob/dbdd0add63f986f2ed2d56c9736303d133add23c/aioredis/connection.py#L850

to if not self.is_connected:

Redis is checking using self._sock, but we don't use self._sock. This could be the underlying reason, though I'm not sure how we didn't catch this early on or if a PR changed this somehow.

Andrew-Chen-Wang avatar Dec 02 '21 15:12 Andrew-Chen-Wang

Nope, this doesn't help, same error. By the way, I noticed that I switched the errors coming from the current release and the current master branch in my previous comment. Apologies, the comment is amended now.

Enchufa2 avatar Dec 02 '21 18:12 Enchufa2

This is interesting. It turns out I had redis v3.5.3 installed, and this is the version that recovers from connection aborts and resets. I just updated to v4.0.2 and it shows this same issue.

Enchufa2 avatar Dec 02 '21 20:12 Enchufa2

Reported in https://github.com/redis/redis-py/issues/1772

Enchufa2 avatar Dec 02 '21 21:12 Enchufa2

@Enchufa2 I'm unable to reproduce this error in both redis main/master branch and aioredis==2.0.0. https://github.com/Andrew-Chen-Wang/aioredis-issue-778 unlike ss, I'm using CLIENT KILL and CLIENT KILL ADDR with no success. When doing CLIENT LIST, in both cases, a new connection is established, and this is shown via an incremented ID. I was using gitpod since I don't have a tool similar to ss on Mac for killing sockets. There may be a chance CLIENT KILL gave aioredis a warning beforehand, but if a Connection reset by peer is occurring, then I can't imagine functionality similar to CLIENT KILL not being performed.

Andrew-Chen-Wang avatar Dec 02 '21 23:12 Andrew-Chen-Wang

If I use CLIENT KILL, I see this with the current master branch:

Traceback (most recent call last):
  File "***/test.py", line 9, in <module>
    print(loop.run_until_complete(redis.get("something")))
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1064, in execute_command
    return await self.parse_response(conn, command_name, **options)
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1080, in parse_response
    response = await connection.read_response()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 854, in read_response
    response = await self._parser.read_response()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 367, in read_response
    raw = await self._buffer.readline()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 301, in readline
    await self._read_from_socket()
  File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 250, in _read_from_socket
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
aioredis.exceptions.ConnectionError: Connection closed by server.

Enchufa2 avatar Dec 02 '21 23:12 Enchufa2

We were also seeing this in production with aioredis==2.0.0. Our solution was to remove the connection pool and rely on single connections (although this isn't ideal).

@Andrew-Chen-Wang I see you were unable to reproduce, is there any current plan to look into this further? Or would you like someone to put together a PR with a potential fix?

cjdsellers avatar Dec 21 '21 19:12 cjdsellers

@cjdsellers I still can't reproduce this issue, so a PR is much appreciated as we're down a maintainer now and I'm stuck with work.

Andrew-Chen-Wang avatar Dec 21 '21 19:12 Andrew-Chen-Wang

I totally understand regarding the bandwidth. Maybe @Enchufa2 could chime in here with a solution strategy, as he seems to be very familiar with the issue?

cjdsellers avatar Dec 21 '21 19:12 cjdsellers

Tracking https://github.com/redis/redis-py/issues/1789

Andrew-Chen-Wang avatar Dec 21 '21 22:12 Andrew-Chen-Wang

@cjdsellers My bet is that a possible solution would be to catch OSError in https://github.com/aio-libs/aioredis-py/blob/224f843bd4b33d657770bded6f86ce33b881257c/aioredis/connection.py#L1423 but I haven't had the time to test this yet. I'm also stuck with work :(, and the workaround in https://github.com/aio-libs/aioredis-py/issues/778#issuecomment-984654896, although not ideal, works well.

Enchufa2 avatar Dec 22 '21 10:12 Enchufa2

OK, I could put together a PR for you guys to look at. I'll see what I can do over this festive period.

@Enchufa2 In the mean time I did try the workaround as you suggested in your comment, however maybe my implementation wasn't correct because the issue remained. If you're successfully using this work around, then some sort of code snippet would be much appreciated!

cjdsellers avatar Dec 22 '21 19:12 cjdsellers

OK, I could put together a PR for you guys to look at. I'll see what I can do over this festive period.

There are situations in which aioredis fails and redis does not (for instance, after a CLIENT KILL, see https://github.com/redis/redis-py/issues/1772). So keep in mind that aioredis may have additional issues not present in redis. In this regard, I would try a fix in redis in the first place, and then port it here.

@Enchufa2 In the mean time I did try the workaround as you suggested in your comment, however maybe my implementation wasn't correct because the issue remained. If you're successfully using this work around, then some sort of code snippet would be much appreciated!

Currently I'm using aioredis in a microservice that makes a snapshot of certain information every n minutes. The code is not public, but this is essentially what it does:

async def set_snapshot():
    # ...
    try:
        s = snapshot.get()
        await redis.set("snapshot-identifier", s)
    finally:
        # workaround for https://github.com/aio-libs/aioredis-py/issues/778
        await redis.connection_pool.disconnect()

As I said, this fragment is currently called every n minutes, so destroying the pool for every call doesn't really have so much impact in my case. My original issue started when the infra people set up a firewall to automatically abort long-lasting connections.

Enchufa2 avatar Dec 22 '21 20:12 Enchufa2

So this simple fix https://github.com/redis/redis-py/pull/1832 solves the issues for redis. That patch is required here but not sufficient, because the situation is more complex, as aioredis did not catch up on how redis manages command execution. redis makes unconditional retries, e.g., here

https://github.com/redis/redis-py/blob/04b8d34e212723974b9b1f484fe7cd9e93f0e315/redis/client.py#L1171-L1175

However, aioredis fails unless the error is a TimeoutError here

https://github.com/aio-libs/aioredis-py/blob/a28e0e9a21f6b645952d888e65f29b433218fba5/aioredis/client.py#L1088-L1089

Porting https://github.com/redis/redis-py/pull/1832 and commenting out those two lines above solves this issue. But that's not a solution, obviously. Therefore, aioredis would need to port the Retry class from redis first, or a similar mechanism, and then modify all the client functions to use call_with_retry.

Enchufa2 avatar Dec 24 '21 10:12 Enchufa2