redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

Async cancellation swallowed by BRPOPLPUSH

Open aaliddell opened this issue 2 years ago • 3 comments

Should issues be submitted here or on aioredis package now that they've merged?

Using the BRPOPLPUSH command on an asyncio pool, if a cancellation occurs at exactly the right time it is swallowed and never bubbled up to calling code. This leads to tasks that never exit and violates the convention that CancelledError should never be swallowed. The same behaviour is seen with BRPOP, so I assume this is related to blocking calls. I can't recreate with non-blocking calls, but it may just be the race window is smaller there.

I can't see an obvious place for this swallowing of the cancellation in the code, but you are catching BaseException which will catch CancelledError (unlike catching on Exception), so care needs to be taken to ensure it is always re-raised after any cleanup is done. In cases where you are catching BaseException it appears you are trying to disconnect, so perhaps there is a deadlock between the disconnect and the command submit?

Demo

Run the following inside a loop (wrap it all in asyncio.run or just run raw in IPython cell). It will not fail every time, as it is timing dependent. You will have to adjust the 0.003 second delay up or down depending on your system speed until it begins to fail. I get about 25% failure with the code below.

import asyncio
from redis import asyncio as aioredis

pool = aioredis.from_url('redis://')

async def runner():
    print('start')
    try:
        async with pool as conn:
            print('have conn')
            
            await conn.get('key')
            
            print('done get')
            
            await conn.brpoplpush('key', 'other key')
            
            print('never expect to get here due to blocking')
    finally:
        print('bye')

task = asyncio.create_task(runner())
print('created')
await asyncio.sleep(0.003)  # <- Try to land cancel at exact right point
task.cancel()
print('cancelled')

await asyncio.sleep(1) # <- Give a generous time to complete cancel
assert task.cancelled() # <- Expect this to never fail
print('ok')

The above code should never fail the assertion, as we have cancelled the task and the CancelledError should propagate out within the time we've given it (waiting longer doesn't help, the task never stops). If the 0.003 second delay is decreased, it never fails. Similarly, if the 0.003 second delay is increased, it also never fails. So there is a short window during the start of the conn.brpoplpush command that the cancellation is swallowed and the task never exits.

On a failure, the following will be printed:

created
start
have conn
done get
cancelled

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_3521979/4075451794.py in <module>
     28 
     29 await asyncio.sleep(1)
---> 30 assert task.cancelled()
     31 print('ok')

AssertionError: 

On a good run, you will get the following if the cancel was too early:

created
start
have conn
cancelled
bye
ok

Or if the cancel was too late to trigger:

created
start
have conn
done get
cancelled
bye
ok

Info

Version: 4.2.0rc1 and also 2.0.1 of aioredis

Platform: Linux, Python 3.8.10 and 3.10.2 confirmed

aaliddell avatar Feb 28 '22 19:02 aaliddell

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Mar 01 '23 00:03 github-actions[bot]

bump

jakob-keller avatar Mar 01 '23 08:03 jakob-keller

We solved few bags with async canceled errors from the time this issue opened, can someone check if this is still relevant or solved by the fixes we did?

dvora-h avatar Feb 11 '24 00:02 dvora-h