channels_redis icon indicating copy to clipboard operation
channels_redis copied to clipboard

channels_redis keeps stale connection to redis

Open rythm-of-the-red-man opened this issue 1 year ago • 1 comments

Stack

  • Redis instance hosted on azure (aka Azure Cache for Redis)
redis = "^5.0.0"
django-redis = "^5.2.0"
channels-redis = "^4.2.0"
channels = { extras = ["daphne"], version = "^4.0.0" }
Django = "~4.2"

all hosted on azure kubernetes service after ingress-nginx and load balancer.

Traceback

ERROR 2024-06-20 19:41:09,956 daphne.server Exception inside application: Error UNKNOWN while writing to socket. Connection lost.
Traceback (most recent call last):
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 473, in send_packed_command
    await self._writer.drain()
  File '/usr/local/lib/python3.11/asyncio/streams.py', line 392, in drain
    await self._protocol._drain_helper()
  File '/usr/local/lib/python3.11/asyncio/streams.py', line 166, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File '/usr/local/lib/python3.11/site-packages/channels/routing.py', line 62, in __call__
    return await application(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/app/myapp/websockets/middleware.py', line 25, in __call__
    return await super().__call__(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/middleware.py', line 24, in __call__
    return await self.inner(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/routing.py', line 132, in __call__
    return await application(
           ^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 94, in app
    return await consumer(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 58, in __call__
    await await_many_dispatch(
  File '/usr/local/lib/python3.11/site-packages/channels/utils.py', line 50, in await_many_dispatch
    await dispatch(result)
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 73, in dispatch
    await handler(message)
  File '/usr/local/lib/python3.11/site-packages/channels/generic/websocket.py', line 249, in websocket_disconnect
    await self.disconnect(message['code'])
  File '/app/myapp/websockets/consumers.py', line 24, in disconnect
    await self.channel_layer.group_discard(self.group_name, self.channel_name)
  File '/usr/local/lib/python3.11/site-packages/channels_redis/core.py', line 518, in group_discard
    await connection.zrem(key, channel)
  File '/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/redis/asyncio.py', line 66, in _sentry_execute_command
    return await old_execute_command(self, name, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 612, in execute_command
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py', line 62, in call_with_retry
    await fail(error)
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 599, in _disconnect_raise
    raise error
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py', line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 585, in _send_command_parse_response
    await conn.send_command(*args)
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 497, in send_command
    await self.send_packed_command(
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 484, in send_packed_command
    raise ConnectionError(
redis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.

Description

This issue keep happening on prod or test env (more or less cloned prod) when we couble channels with redis.

I suspect that managed redis instance timeout idle connection and channels_redis do not attempt to re-connect (the idea might be dumb tho, if so, I'm sorry I don't know much about internals of channels_redis and redis in general). I think it might be a case because the scheme kinda looks as follows: Issue occures when I turn on client app, wait ~10 minutes then try to do any action related to channels like re-establish websocket connection by refreshing page.

I assumed that it might be channels_redis bug that's why I wrote about it here. I'd love any feedback, thanks in advance.

Strange part

Well that's kinda odd but since it happened I decided to include it here. when I run daphne instance in dockerfile like this:

ENTRYPOINT ["/app/etc/entrypoint.sh"]
CMD ["web-prod"]

#########     entrypoint.sh calls this script
#!/bin/bash
# only now we have access to environmental variables so we can call collectstatic
python manage.py collectstatic --noinput -v 0 > /dev/null 2>&1
# run app
daphne -b 0.0.0.0 -p 8000  medishout.asgi:application  -v 3

the issue appears, but if I run another server after connecting to working pod like

daphne -b 0.0.0.0 -p 8001  medishout.asgi:application  -v 3

and i connect to the 2nd one the issue doesn't seem to appear (or I didn't managed to catch it).

rythm-of-the-red-man avatar Jun 21 '24 08:06 rythm-of-the-red-man