channels_redis icon indicating copy to clipboard operation
channels_redis copied to clipboard

Wrong order of messages in process specific channel if receive call was cancelled.

Open sergey-komissarov opened this issue 5 years ago • 2 comments

Hello, we are using group to send notifications from channels worker process to the websocket consumer in the server process. There is exactly one worker in the group and exactly one consumer on the server side. Sometimes messages order is changed on the server side.

Detailed investigation shows the following events on the server side:

  1. For the channel specific.innxCCgU!NcMqIZztkRQJ receive_lock acquired and receive_single started.
  2. Before this call core.py#L483 the redis channel messages list had 3 messages and backup list was empty:
asgi:specific.innxCCgU! = [msg3, msg2, msg1]
asgi:specific.innxCCgU!$inflight = []
  1. Receive was cancelled and receive_lock was released:
Traceback (most recent call last):
  File "/pSeven/venv/lib/python3.7/site-packages/channels_redis/core.py", line 429, in receive
    real_channel
  File "/pSeven/venv/lib/python3.7/site-packages/channels_redis/core.py", line 484, in receive_single
    index, channel_key, timeout=self.brpop_timeout
  File "/pSeven/venv/lib/python3.7/site-packages/channels_redis/core.py", line 329, in _brpop_with_clean
    return await connection.brpoplpush(channel, backup_queue, timeout=timeout)
concurrent.futures._base.CancelledError
  1. The next call to receive for the channel specific.innxCCgU!EKHzfqiTYmHm acquired receive_lock and found redis lists in the following state:
asgi:specific.innxCCgU! = [msg3, msg2]
asgi:specific.innxCCgU!$inflight = [msg1]

It seems that the redis server already moved message to the backup list during brpoplpush operation but the message body was not fully received when the task was cancelled. 5. After core.py#L327 msg1 is placed to the beginning of the list asgi:specific.innxCCgU! and msg2 is returned by receive. 6. Next call to the receive will return msg3 and only the next will get msg1 from the redis.

The simplest way to fix it is to modify script at core.py#L316 and place messages from the backup list at the end of the channel list. But I also think that redis backup list cleanup must be done under the receive_lock because another call to receive may acquire lock and place message from the backup list back to the channel list before cleanup coroutine will be called.

Our environment setup (docker image based on ubuntu and daphne as a server): python version: 3.7.1 installed packages:

aioredis                   1.2.0      
asgiref                    2.3.2      
channels                   2.1.7      
channels-redis             2.3.3      
daphne                     2.2.5      
Django                     2.1.7      
django-channels-jsonrpc    1.2.0      

sergey-komissarov avatar Apr 04 '19 14:04 sergey-komissarov

Does this bug still exists? Maybe channels needs tests which simulate cancelation of every network operation.

HMaker avatar Apr 08 '23 02:04 HMaker

@HMaker I believe this bug is fixed. Last time I checked channels used priority queue based on message creation time instead of list. This must keep messages in order.

sergey-komissarov avatar Apr 10 '23 10:04 sergey-komissarov