umongo
umongo copied to clipboard
Check if value changed before marking as modified
Data proxy is always marked as modified after its set()
method is called. It does not check if the value was really changed before marking it as modified.
https://github.com/Scille/umongo/blob/15f0ca9751ed9e1c41d5e7aa98a9d075b8ec2e38/umongo/data_proxy.py#L145-L153
I think it can be more performant if the equivalence check is performed since it may save a commit to database.
Correct me if I misunderstand anything. 😃
I guess it is a balance.
If a workflow implies many affectations (setattr) before a single commit, the perf penalty from the test could be important. I'm thinking about huge nested structures for which the equality could be long to compute.
Don't the huge nested structures also take lots of bandwidths to transfer across network even there is no any actual changes in the structures?
It seems that it is another trade-off problem between IO cost or computation cost. Is there a way to delegate the decision (whether or not the commit should be triggered and sent to the database) to developers?
I see. I didn't think of the remote database case.
Assuming we manage to check this conditionally, I suppose we could add a parameter to choose that. Perhaps a class attribute of BaseField
. I don't think we need to pass this through all methods.
Would you like to investigate this?
I'll provide some pseudo codes to demonstrate the possible implementation in my imagine.
First of all, there is a comparator
attribute on the BaseField
class which is a function with two inputs (the old value and the new value) and one boolean output (represents whether the values are actually the same or not).
# pseudo code
class BaseField:
comparator = lambda old_value, new_value: old_value == new_value
Since each subclass of BaseField
knows the data type it holds, it can override the comparator
to perform the equivalence test on the specific data type.
# pseudo code
class CustomField(BaseField):
# assume the data in this field is a dict with a specific structure
# key "key" holds the information about whether the value changed or not
comparator = lambda old_value, new_value: old_value["key"] == new_value["key"]
The comparator
is invoked inside the BaseDataProxy#set()
method to test if the data is modified. If the comparator returns True
, means the two values are actually the same, the data is not marked as modified; otherwise, if the comparator returns False
, it is modified.
def set(self, name, value, to_raise=KeyError):
# ... skipped #
if not field.comparator(old_value, new_value):
self._data[name] = new_value
self._mark_as_modified(name)
To keep the compatibility with the previous behavior, the comparator
can defaults to a function that always returns False
in the BaseField
class, which will always mark dirty if BaseDataProxy#set()
is invoked.
# pseudo code
class BaseField:
comparator = lambda o, n: False # for compatibility
def __init__(self, comparator=None):
self.comparator = comparator or self.comparator
Besides subclassing, develops can provide comparator
keyword argument to override the behavior on an existing subclass of BaseField.
I think most of the fields can be compared via ==
operator. If "huge nested structure" is a concern for some of the fields, we can make comparator
s of those fields always return False
.
But I'm not really sure which fields, except DictField
and ListField
(and EmbeddedField
maybe?), may contains huge nested structure. Hope that you can help to list those classes out so I can skip their comparators.
What do you think about this? Does it seem feasible?
I can prepare a PR during this weekend.
Yes, I'm thinking of EmbeddedField
, alone or in a container field such as List (or Dict when moving to marshmallow 3).
I'd rather use normal functions than lambdas. I thought ==
would work for all fields. I didn't think we'd need a dedicated function.
Couldn't it just be
def set(self, name, value, to_raise=KeyError):
# ... skipped #
if value != self._data[name]:
self._data[name] = new_value
self._mark_as_modified(name)
Don't forget delete
method.
The flag activating the feature can also be in BaseDataProxy
:
class BaseDataProxy:
CHECK_MODIFIED = False
def set(self, name, value, to_raise=KeyError):
# ... skipped #
if self.CHECK_MODIFIED and value != self._data[name]:
self._data[name] = new_value
self._mark_as_modified(name)
The reason I use a dedicated function is that I'm not sure if ==
can satisfied all types of fields, therefore I provide a way for developers to override the function via subclass.
But as you said if ==
should work for all fields, then I think using ==
has no problem.
One more question, if the flag is as an attribute of BaseDataProxy, how should a developer activate the feature?
class BaseDataProxy:
CHECK_MODIFIED = False
@instance.register
class MyDocument(Document):
# how to activate the check_modified feature?
pass
Many thanks.
I suppose this would work.
from umongo.dataproxy import BaseDataProxy
BaseDataProxy.CHECK_MODIFIED = True
But yes, it looks ugly.
I'm totally open to alternative proposals. For this and for the name of the flag as well.
Shouldn't this affect all subclasses of BaseDataProxy? It sounds like this is a global flag.
BaseDataProxy.CHECK_MODIFIED = True
I was thinking about to activate the feature per Document class (typically means a collection) or per field.
Is there a way to provide the flag on Document classes, maybe on the Meta class?
I guess in some use cases, people may want to set the global flag to True
while escaping checking on certain collections or fields which contain huge nested structures.
Another incentive to implement this is the case where one wants to forbid modifying an attribute once the document is saved. This could be done in a pre_update
processor that would check if the attribute was modified. Not perfect as it will only complain on save, not on attribute modification. I didn't find a better way, though.
@momocow Yes, my proposal was a global flag. But I have no objection to a per-collection flag. Can't investigate this right now.
I'm testing on my own branch, making it as a per-collection flag and a per-field flag also. A demo snippet will be provided when I think I'm ready ;)