mongoengine
mongoengine copied to clipboard
Transaction API
I'd like to open the discussion on the interface that we want to use for supporting the use of transactions/sessions within MongoEngine.
Items to keep in mind:
- Session needs to be used in all queries (write/update obviously but also any read queries)
- There can be multiple sessions used concurrently, (e.g: 1 per database, not sure we want to support multiple sessions within 1 database)
- Need to keep in mind the signals (pre/post_save, etc) that could benefit from this
- Need to keep in mind the deletion rules (CASCADE, etc) that could benefit from this
- Need to keep in mind that different databases can be connected to different models (+things like
switch_database)
Assuming a test class like
class Person(Document):
name = StringField()
I see 3 options:
- Explicitly pass the
sessionobject to each and every queryset/queries
with start_session_with_transaction() as session:
person = Person()
person.save(session=session)
Person.objects(name='test', session=session)
Person.objects.aggregate([a_pipeline], session=session)
Person.objects().update(name="Updated Test User", session=session)
Person.objects.insert(Person(), session=session)
Pros:
- Explicit
Cons:
- Users are forced to pass the session around in the whole code base
Person.objects(name='test', session=session)looks a bit weird- Means that we need to account for contradictory cases like
Person.objects(name='test', session=session1).update(name='test2', session=session2)
- Allow to bind a Queryset to a session with
Document.objects().session(session)
with start_session_with_transaction() as session:
person = Person()
person.save(session=session)
# Binds a session to a queryset
qs = Person.objects().session(session)
qs.filter(name='test')
qs.count()
qs.aggregate([your_fancy_pipeline])
Person.objects().session(session).update(name="Updated Test User")
Person.objects().session(session).insert(Person())
Pros:
- more or less explicit
Cons:
- Users are forced to pass the session around in the whole code base, every class involved in a request needs to be manually bound to the session
- I find it error prone, its very easy to forget it
Person.objects().session(session).insert(Person())looks quite ugly
- Implicitly bind all underlying operations to a session (à la django)
with run_in_transaction(alias="default"):
person = Person()
person.save()
qs = Person.objects()
(...)
Pros:
- simple to use
- compatible with existing code base - i.e: easy adoption by users
- mimic django (django being an inspiration for this project)
- less error prone since user only has to define a
Cons:
- very implicit
- does not comply with a
repository pattern / unit of workapproach - unclear how we'll manage different databases
Of course, we would provide a decorator in addition to the context manager.
Personally I initially thought that 2) would be the best but I quickly came across a few examples that looked unnatural, so I'd suggest to go for 3)
Let me know what you think! (especially @wojcikstefan @erdenezul but happy to gather more feedbacks)
Option 3 looks better.
Option 3 looks good
I like option 3
I like option 3 too
I vote for option 3 too
Vote for option 3!
Is there any coding progress? I'm REALLY interested in using MongoDB's transactions.... so interested that I'd offer to code it but I'm still fairly new to Python.
Thanks for your interest @coughlinjake so far, I've only do a small proof of concept. I was in fact waiting to have some decisions from the maintainer (ping @wojcikstefan). I'm pretty busy now but if I manage to start at some point on this, I'll check if we can share the workload
I can help you there. If you are ok I can share the load. Do we have any design or spec available or need to write one?
Get Outlook for iOShttps://aka.ms/o0ukef
From: Jason Coughlin [email protected] Sent: Thursday, August 6, 2020 5:22:10 AM To: MongoEngine/mongoengine [email protected] Cc: Das, Ajitesh [email protected]; Comment [email protected] Subject: Re: [MongoEngine/mongoengine] Transaction API (#2248)
Is there any coding progress? I'm REALLY interested in using MongoDB's transactions.... so interested that I'd offer to code it but I'm still fairly new to Python.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MongoEngine/mongoengine/issues/2248#issuecomment-669894053, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO6S4FH7KMHPIKNJBSM5YSDR7KN7FANCNFSM4KI3XZAQ.
looks like, we have some helps at our disposals. If we have decided on the approach, we can go ahead start the project. Who takes the decision here? Can he please fill in. Thanks! Ajitesh
--
A person should not be too straight or too honest. Straight trees are cut first and honest people are screwed first. -Chanakya
From: Henry [email protected] Sent: Monday, July 13, 2020 10:29 PM To: MongoEngine/mongoengine [email protected] Cc: Das, Ajitesh [email protected]; Comment [email protected] Subject: Re: [MongoEngine/mongoengine] Transaction API (#2248)
Vote for option 3!
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MongoEngine/mongoengine/issues/2248#issuecomment-657975452, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO6S4FAL6WRWG64I2MVK7I3R3PUKBANCNFSM4KI3XZAQ.
Hi @bagerard , I am not sure if I am on right track or not, but I modified the library for transaction support. Following is the snippet for using the session.
k=get_connection()
with k.start_session() as mongosession:
with mongosession.start_transaction():
prevData=getToLogValue(dtype=cls._ParentClass._fields.get('XYZ'),value=p['XYZ'])
prevData={cls._ParentClass._toForm.get('XYZ'):prevData}
item.update(session=mongosession, **n)
newData=getToLogValue(dtype=cls._ParentClass._fields.get('XYZ'),value=n['XYZ'])
newData={cls._ParentClass._toForm.get('XYZ'):newData}
itemLog=cls._ItemLog(user=user, action='updated', newData=newData, prevData=prevData, recId=item)
itemLog.save(session=mongosession)
P.S.:
- It is not mandatory to provide a session variable when you don't want sequential execution (transaction execution).
- CASCADE operation is also handled in case of delete
- Have used for read as well.
Is there any coding progress? Hope to realize this feature. Thanks
option 3 will be gooooooood
Will this feature be merged?
option 2 looks good for me
I've been playing around with an implementation for option 3, without getting very deep into the guts of mongoengine (yet 😈 ) and I have some interface/usage thoughts/questions based on poking I've done.
Transactions are fine to run across multiple DBs.
from mongoengine import *
conn1 = connect()
db1 = conn1.db1
db1.test1.insert_one({'foo1': 'bar1'})
db2 = conn1.db2
db2.test2.insert_one({'foo2': 'bar2'})
with conn1.start_session() as session:
with session.start_transaction():
db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=session)
The primary go/no-go point is the client instance (aka connection). So if you tried to run a transaction against two collections/databases across two completely different client instances - no good:
from mongoengine import *
conn1 = connect()
db1 = conn1.db1
db1.test1.insert_one({'foo1': 'bar1'})
conn2 = connect(alias='other-default', port=27018)
db2 = conn2.db2
db2.test2.insert_one({'foo2': 'bar2'})
with conn1.start_session() as session:
with session.start_transaction():
db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=session)
InvalidOperation: Can only use session with the MongoClient that started it
So that's good - internally, there are already checks that sessions are "consistent."
Fun fact - this doesn't raise an exception and goes through fine:
with conn1.start_session() as session:
with session.start_transaction():
db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=None)
Implementation samples
Global _session instance
_session = None
def _set_session(session):
global _session
_session = session
def _get_session():
global _session
return _session
@contextmanager
def run_in_transaction(alias=DEFAULT_CONNECTION_NAME):
conn = get_connection(alias)
with conn.start_session() as session:
with session.start_transaction():
_set_session(session)
yield
_set_session(None)
def disconnect(alias=DEFAULT_CONNECTION_NAME):
# normal disconnect stuff...
_session = None
Sessions by alias
_sessions = {}
def _set_session(session, alias=DEFAULT_CONNECTION_NAME):
global _sessions
_sessions[alias] = session
def _get_session(alias=DEFAULT_CONNECTION_NAME):
global _sessions
return _sessions.get(alias)
@contextmanager
def run_in_transaction(alias=DEFAULT_CONNECTION_NAME):
conn = get_connection(alias)
with conn.start_session() as session:
with session.start_transaction():
_set_session(session, alias)
yield
_set_session(None, alias)
def disconnect(alias=DEFAULT_CONNECTION_NAME):
# normal disconnect stuff...
if alias in _sessions:
del _sessions[alias]
Analysis
From a complexity stand point - I believe a global session would easier to manager, because
- deeper within the mongoengine machinery, we'll only need to provide pymongo the appropriate session kwarg via
session=_get_session() - on
disconnectwe simply set the_sessionvar back toNone
Given the cross-client protections pymongo provides, there's really only one situation that I can think of that would warrant maintaining sessions by alias - someone starts a transaction for conn1 and within it calls disconnect(alias="conn2"):
with conn1.start_session() as session:
with session.start_transaction():
db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
disconnect(alias='conn2')
// ... more db1 session things
But...this seems really weird to me though. My toy example above can be resolved with moving the disconnect out of the context block. The more complex case is the idea that there is a conditional within the transaction that, when met, would require closing the connection to another host. Again - this seems like an odd outlier case that I'm not sure warrants tracking sessions via connection.
If we were to go about tracking sessions via alias I think the primary point of complexity occurs in the lower-level mongoengine API for querysets and documents (and anything else we'd want to make "session conscious") - how does one provide the _get_session(alias=...) method the correct alias value?
While digging around in these lower levels - it does not seem easy to get the alias. That said:
- I have found that a document has a
_get_dbmethod. - the pymongo Database object returned has access to the pymongo Client
- the
aliascould be retrieved by iterating over the connections and returning thealiasfor the matching Client
I know this is a lot - and I welcome any and all questions. I've just been thinking about this pretty intensely for the past few days and I wanted to get my ideas out there, so (1) I don't forget all of this and (2) in case anyone has any feedback that might push me in different directions.
I deleted my last comment regrading coupling of databases/connections 🤦 - missed something simple and I went down a rabbit hole - sorry for the noise ❤️
@juannyG I would just like to push a very pragmatic, almost non technical argument :
- an ORM is a huge feature, 95 % of all projects will require it
- of all work done with an orm, quite a lot can be done without transactions, but also, quite a lot of the same work would be done better with transactions, I’d say it’s more than 10 % of use cases that could benefit from it.
- of all work done with transactions, only a few will need to be done across different mongo connections. It certainly will happen, and it will certainly be asked for. Simply because it’s murphy’s law, anything that can happen will happen, but, it’s probably going to be 10% or less of all work using transactions, and I think even less.
So, while it certainly will be needed in the future, it’s not as important to manage connections as having a transaction API.
You’ll know best if the effort is worth it, I just saw you were wondering about implementing it in your PR, so this is just to say that I think that, yes session-by-connection will be asked for, but it’s a corner case
My vote would be for 3) as well but instead of a global session, consider using a ContextVar so that multiple threads can execute transactions concurrently:
from contextvars import ContextVar
_session = ContextVar("_session", default=None)
def _set_session(session):
if _session.get():
raise InvalidOperation("nested run_in_transaction calls are not allowed")
return _session.set(session)
def _get_session():
return _session.get()
...
You may also want to consider using the ClientSession.with_transaction api because it handles retries automatically.
My 2 cents:
Are these options mutually exclusive?
I think you would like have option 2 anyways and it's quite easy to implement. It goes in line with pymongo==mongo interfaces and is a lot less magical. I suggested that this would be done first. If some one would do this would be open to merging it in? As this issue has ben open for almost three years we could just implement the simple one?
Option 3 is the more complex and magical and I can image there are many pitfalls that are not instantly obvious. Accidentially wrapping too much in one session etc.
One more point. Sometimes you don't need transaction, but just explicit session. Right now it's not possible run long queries with mongoengine at all as cursor will timeout after 30 minutes even with no_cursor_timeout=True.
https://www.mongodb.com/docs/manual/reference/method/cursor.noCursorTimeout/#session-idle-timeout-overrides-nocursortimeout
If a session is idle for longer than 30 minutes, the MongoDB server marks that session as expired and may close it at any time. When the MongoDB server closes the session, it also kills any in-progress operations and open cursors associated with the session. This includes cursors configured with noCursorTimeout() or a maxTimeMS() greater than 30 minutes.
Hi everyone, it's already 2024 and this seems to be a forgotten topic, any updates on this?