Closing connections (disconnect) is not thread-safe
Calling disconnect or disconnect_all in a multi-threaded environment is unsafe. Consider the following code:
import threading
import time
from mongoengine import *
connect(host="mongodb://127.0.0.1:9800/test_db_1")
class User(Document):
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
def thread_1_task():
for user in User.objects:
print(user.email)
disconnect()
def thread_2_task():
time.sleep(5)
for user in User.objects: # <----- This query raises a mongoengine.connection.ConnectionFailure
print(user.email)
disconnect()
def main():
t1 = threading.Thread(target=thread_1_task)
t2 = threading.Thread(target=thread_2_task)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
main()
After taking a look at connection.py, I found that the problem was that connections/db references were being stored in a process level dict. The solution is to probably use either threading.local or contextvars to store these references so that each thread / co-routine would get its own connection object thereby making it safe to execute disconnect in concurrent environments.
If agreed upon, I can submit a PR which I've been working on to fix this.
Could you elaborate on your use case, why do you need to disconnect?
Hey @bagerard , I do agree that it might not be the most common use-case for developers to call disconnect once their app has been booted up. But the following cases may be of importance:
Assuming that I have a constrained resource system like an IoT board which can only have a certain no of file descriptors(sockets) open and that my application talks to 5 different databases, then I might want to recycle connections if the open file descriptor buffer approaches max cap. Or just so that I can return resources to the system, I might want to close connections at the end of a request. Though this might not be the most performant, it would fit my constraint.
Also, it seems that the switch_db context manager is not thread-safe as well.
def __enter__(self):
"""Change the db_alias and clear the cached collection."""
self.cls._meta["db_alias"] = self.db_alias
self.cls._collection = None
return self.cls
The above code is setting the db_alias property at the class level which is shared across threads. If I use switch_db in the route handler of a threaded server, then the change in db_alias caused by one request is going to affect the other request.
The solution to both the above problems is to probably make connections/connection state thread-local. This is typically being used by other ORMs' like Django, SQLAlchemy, peewee etc.,
Do let me know your thoughts on the same.