mongoengine icon indicating copy to clipboard operation
mongoengine copied to clipboard

Closing connections (disconnect) is not thread-safe

Open blazing-gig opened this issue 4 years ago • 2 comments

Calling disconnect or disconnect_all in a multi-threaded environment is unsafe. Consider the following code:

import threading


import time
from mongoengine import *

connect(host="mongodb://127.0.0.1:9800/test_db_1")

class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=50)
    last_name = StringField(max_length=50)

def thread_1_task():
    for user in User.objects:
        print(user.email)
    disconnect()

def thread_2_task():
    time.sleep(5)
    for user in User.objects:   # <----- This query raises a mongoengine.connection.ConnectionFailure
        print(user.email)
    disconnect()


def main():
    t1 = threading.Thread(target=thread_1_task)
    t2 = threading.Thread(target=thread_2_task)

    t1.start()
    t2.start()

    t1.join()
    t2.join()

if __name__ == "__main__":
    main()

After taking a look at connection.py, I found that the problem was that connections/db references were being stored in a process level dict. The solution is to probably use either threading.local or contextvars to store these references so that each thread / co-routine would get its own connection object thereby making it safe to execute disconnect in concurrent environments.

If agreed upon, I can submit a PR which I've been working on to fix this.

blazing-gig avatar Aug 04 '21 13:08 blazing-gig

Could you elaborate on your use case, why do you need to disconnect?

bagerard avatar Aug 23 '21 20:08 bagerard

Hey @bagerard , I do agree that it might not be the most common use-case for developers to call disconnect once their app has been booted up. But the following cases may be of importance:

Assuming that I have a constrained resource system like an IoT board which can only have a certain no of file descriptors(sockets) open and that my application talks to 5 different databases, then I might want to recycle connections if the open file descriptor buffer approaches max cap. Or just so that I can return resources to the system, I might want to close connections at the end of a request. Though this might not be the most performant, it would fit my constraint.

Also, it seems that the switch_db context manager is not thread-safe as well.

def __enter__(self):
        """Change the db_alias and clear the cached collection."""
        self.cls._meta["db_alias"] = self.db_alias
        self.cls._collection = None
        return self.cls

The above code is setting the db_alias property at the class level which is shared across threads. If I use switch_db in the route handler of a threaded server, then the change in db_alias caused by one request is going to affect the other request.

The solution to both the above problems is to probably make connections/connection state thread-local. This is typically being used by other ORMs' like Django, SQLAlchemy, peewee etc.,

Do let me know your thoughts on the same.

blazing-gig avatar Dec 09 '21 10:12 blazing-gig