databases icon indicating copy to clipboard operation
databases copied to clipboard

Using ContextVar to hold connections does not guarantee they are always unique to tasks.

Open trvrm opened this issue 5 years ago • 4 comments

I think there's a fairly fundamental problem with the way that database connections are managed using ContextVar.

Database.connection() is designed to return a Connection object that is unique to an asyncio Task, but because of the way that context is copied during Task creation, this is not always the case.

Consider the following code:

import asyncio
from databases import Database
database = Database("postgresql://example:password@localhost/example")

async def t1():
    await database.connect()
    c1 = database.connection()

    async def inner():
        c2 = database.connection()
        assert c1 is not c2 # this fails
    await asyncio.gather(inner())


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(t1())

This raises an assertion, because when gather creates a new task, it copies the current context into the newly created coroutine.

Tasks support the contextvars module. When a Task is created it copies the current context and later runs its coroutine in the copied context.

  • https://docs.python.org/3/library/asyncio-task.html#task-object

This is the fundamental problem behind https://github.com/encode/databases/issues/134 and possibly several other issues. In general it makes it impossible to safely use this library in any code that uses gather or create_task.

Unfortunately I'm not sure what the best way of fixing this is. The only idea I've had so far is storing the connection as an attribute of asyncio.current_task() rather than in a ContextVar, but that feels a bit hacky.

trvrm avatar Jul 27 '20 19:07 trvrm

This is (almost) a duplicate of https://github.com/encode/databases/issues/176 I posted some workarounds there. I am opening new connections in my prod and that works very well; my opinion is that this ContextVar feature is harmful and should be removed. It does not save much performance (I py-spy regularly) and leads to many problems, some of them are performance-critical.

vmarkovtsev avatar Jul 28 '20 07:07 vmarkovtsev

@trvrm This is probably fixed by #328 . Please re-open if it still exists.

aminalaee avatar Aug 26 '21 17:08 aminalaee

@aminalaee unfortunately this does not fix this issue. The sample code above still fails.

The problem is still that although the code comments in core.py say "Connections are stored as task-local state", the way that python contextvars work is that when you create a new task, the context variables from the parent task are copied to the child.

Unfortunately I don't think I have permission to re-open this issue - is that something you can do?

trvrm avatar Aug 27 '21 22:08 trvrm

@trvrm Thanks, I'll take a look into it.

aminalaee avatar Aug 28 '21 05:08 aminalaee

Thanks for the great write up @trvrm - I agree about how connections should not be inherited to child tasks, but rather new connections should be acquired. In #546, I think I have a solution. Each database instance stores a WeakKeyDictionary[asyncio.Task, databases.core.Connection] so that each task can have it's own connection, and we still get a neat bit of cleanup by using weakrefs on the tasks. Hopefully this helps address the concurrent woes that keep coming up across quite a few issues in this repo.

zevisert avatar May 26 '23 18:05 zevisert