umongo icon indicating copy to clipboard operation
umongo copied to clipboard

Check if value changed before marking as modified

Open momocow opened this issue 5 years ago • 11 comments

Data proxy is always marked as modified after its set() method is called. It does not check if the value was really changed before marking it as modified.

https://github.com/Scille/umongo/blob/15f0ca9751ed9e1c41d5e7aa98a9d075b8ec2e38/umongo/data_proxy.py#L145-L153

I think it can be more performant if the equivalence check is performed since it may save a commit to database.

Correct me if I misunderstand anything. 😃

momocow avatar Nov 06 '19 09:11 momocow

I guess it is a balance.

If a workflow implies many affectations (setattr) before a single commit, the perf penalty from the test could be important. I'm thinking about huge nested structures for which the equality could be long to compute.

lafrech avatar Nov 06 '19 10:11 lafrech

Don't the huge nested structures also take lots of bandwidths to transfer across network even there is no any actual changes in the structures?

It seems that it is another trade-off problem between IO cost or computation cost. Is there a way to delegate the decision (whether or not the commit should be triggered and sent to the database) to developers?

momocow avatar Nov 07 '19 08:11 momocow

I see. I didn't think of the remote database case.

Assuming we manage to check this conditionally, I suppose we could add a parameter to choose that. Perhaps a class attribute of BaseField. I don't think we need to pass this through all methods.

Would you like to investigate this?

lafrech avatar Nov 07 '19 08:11 lafrech

I'll provide some pseudo codes to demonstrate the possible implementation in my imagine.


First of all, there is a comparator attribute on the BaseField class which is a function with two inputs (the old value and the new value) and one boolean output (represents whether the values are actually the same or not).

# pseudo code
class BaseField:
    comparator = lambda old_value, new_value: old_value == new_value

Since each subclass of BaseField knows the data type it holds, it can override the comparator to perform the equivalence test on the specific data type.

# pseudo code
class CustomField(BaseField):
    # assume the data in this field is a dict with a specific structure
    # key "key" holds the information about whether the value changed or not
    comparator = lambda old_value, new_value: old_value["key"] == new_value["key"]

The comparator is invoked inside the BaseDataProxy#set() method to test if the data is modified. If the comparator returns True, means the two values are actually the same, the data is not marked as modified; otherwise, if the comparator returns False, it is modified.

def set(self, name, value, to_raise=KeyError):
    # ... skipped #
    if not field.comparator(old_value, new_value):
        self._data[name] = new_value
        self._mark_as_modified(name)

To keep the compatibility with the previous behavior, the comparator can defaults to a function that always returns False in the BaseField class, which will always mark dirty if BaseDataProxy#set() is invoked.

# pseudo code
class BaseField:
    comparator = lambda o, n: False # for compatibility

    def __init__(self, comparator=None):
        self.comparator = comparator or self.comparator

Besides subclassing, develops can provide comparator keyword argument to override the behavior on an existing subclass of BaseField.

I think most of the fields can be compared via == operator. If "huge nested structure" is a concern for some of the fields, we can make comparators of those fields always return False.

But I'm not really sure which fields, except DictField and ListField (and EmbeddedField maybe?), may contains huge nested structure. Hope that you can help to list those classes out so I can skip their comparators.


What do you think about this? Does it seem feasible?

I can prepare a PR during this weekend.

momocow avatar Nov 07 '19 10:11 momocow

Yes, I'm thinking of EmbeddedField, alone or in a container field such as List (or Dict when moving to marshmallow 3).

I'd rather use normal functions than lambdas. I thought == would work for all fields. I didn't think we'd need a dedicated function.

Couldn't it just be

def set(self, name, value, to_raise=KeyError):
    # ... skipped #
    if value != self._data[name]:
        self._data[name] = new_value
        self._mark_as_modified(name)

Don't forget delete method.

The flag activating the feature can also be in BaseDataProxy:

class BaseDataProxy:
    CHECK_MODIFIED = False

def set(self, name, value, to_raise=KeyError):
    # ... skipped #
    if self.CHECK_MODIFIED and value != self._data[name]:
        self._data[name] = new_value
        self._mark_as_modified(name)

lafrech avatar Nov 07 '19 10:11 lafrech

The reason I use a dedicated function is that I'm not sure if == can satisfied all types of fields, therefore I provide a way for developers to override the function via subclass.

But as you said if == should work for all fields, then I think using == has no problem.


One more question, if the flag is as an attribute of BaseDataProxy, how should a developer activate the feature?

class BaseDataProxy:
    CHECK_MODIFIED = False

@instance.register
class MyDocument(Document):
    # how to activate the check_modified feature?
    pass

Many thanks.

momocow avatar Nov 07 '19 16:11 momocow

I suppose this would work.

from umongo.dataproxy import BaseDataProxy

BaseDataProxy.CHECK_MODIFIED = True

But yes, it looks ugly.

I'm totally open to alternative proposals. For this and for the name of the flag as well.

lafrech avatar Nov 07 '19 16:11 lafrech

Shouldn't this affect all subclasses of BaseDataProxy? It sounds like this is a global flag.

BaseDataProxy.CHECK_MODIFIED = True

I was thinking about to activate the feature per Document class (typically means a collection) or per field.

Is there a way to provide the flag on Document classes, maybe on the Meta class?

I guess in some use cases, people may want to set the global flag to True while escaping checking on certain collections or fields which contain huge nested structures.

momocow avatar Nov 08 '19 03:11 momocow

Another incentive to implement this is the case where one wants to forbid modifying an attribute once the document is saved. This could be done in a pre_update processor that would check if the attribute was modified. Not perfect as it will only complain on save, not on attribute modification. I didn't find a better way, though.

lafrech avatar Nov 19 '19 15:11 lafrech

@momocow Yes, my proposal was a global flag. But I have no objection to a per-collection flag. Can't investigate this right now.

lafrech avatar Nov 19 '19 15:11 lafrech

I'm testing on my own branch, making it as a per-collection flag and a per-field flag also. A demo snippet will be provided when I think I'm ready ;)

momocow avatar Dec 04 '19 06:12 momocow