mongoengine icon indicating copy to clipboard operation
mongoengine copied to clipboard

Supporting Alternate Values for MongoDB fields

Open bhairav13 opened this issue 3 years ago • 4 comments

Hello,

I have a somewhat different need/requirement in how I use mongoengine and wanted to share the below to get any ideas on how to deal with it.

My requirement is that for some fields in a document, I want to be able to keep two values. One is a system-default value and the other a user-specified value. When accessing this field, it should return back a value in the following manner:

  • if a user-specified value exists, return that OR
  • if a user-specified value doesn’t exist, return the default

I can not use the default attribute that mongoengine already provides because:

  1. It is applied to the mongoengine class model rather than the mongodb document (defaults can be different from a document-to-document basis)
  2. The default value would need to be updated based on system changes in newer versions of the software
  3. There would not be a memory of what the default is at the document level so it would be difficult to determine if the default needs to be changed for certain documents only

I thought of creating something called an AlternateValueField, which is based off a MapField. Below is the code for this new field type:

import mongoengine

class AlternateValueField(mongoengine.MapField):
    def __init__(self, field=None, *args, **kwargs):
        self.allowed_keys = (
            '_active',  # Active Value - changeable by user
            '_default',  # Default Value - provided by system
        )
        # Remove the field's default since it now gets handled
        # as _default
        self.kwargs_default = kwargs.pop('default', None)
        super().__init__(field=field, *args, **kwargs)

    def __get__(self, instance, owner):
        result = super().__get__(instance, owner)
        if isinstance(result, dict):
            if '_active' in result.keys():
                return result['_active']
            if '_default' in result.keys():
                return result['_default']
        return result

    def __set__(self, instance, value):
        instance_dict = instance._data.get(self.name)
        if isinstance(value, dict):
            if isinstance(instance_dict, dict):
                # To keep the order (_active followed by _default)
                new_value = {}
                for key in self.allowed_keys:
                    if (key in instance_dict.keys()
                            and key not in value.keys()):
                        new_value[key] = instance_dict[key]
                    elif key in value.keys():
                        new_value[key] = value[key]
                value = new_value
        elif value is not None:
            new_value = {
                '_active': value,
            }
            if isinstance(instance_dict, dict):
                if '_default' in instance_dict.keys():
                    new_value['_default'] = instance_dict['_default']
            value = new_value
        else:
            if self.required and self.default is not None:
                new_value = {
                    '_default': self.kwargs_default,
                }
                value = new_value
        self._check_for_allowed_keys(value)
        super().__set__(instance, value)

    def _check_for_allowed_keys(self, value):
        if value is None:
            return
        for key in value.keys():
            if key in self.allowed_keys:
                continue
            err_msg = (
                f"Key '{key}' is not allowed."
                f" Allowed keys: {self.allowed_keys}"
            )
            raise KeyError(err_msg)

    def validate(self, value, **kwargs):
        super().validate(value, **kwargs)
        self._check_for_allowed_keys(value)

A document example that uses it:

class MySchedule(mongoengine.Document):
    my_name = AlternateValueField(mongoengine.StringField())
    sched1 = mongoengine.StringField()

Using this model to create a document:

>>> s2 = MySchedule(my_name={'_active': 'one', '_default': 'two'}, sched1='yes')

>>> s2.validate()

>>> s2.my_name  # returns just the value (of active or default) instead of dict
'one'

>>> s2.to_json()
'{"my_name": {"_active": "one", "_default": "two"}, "sched1": "yes"}'

>>> s2.to_mongo()
SON([('my_name', {'_active': 'one', '_default': 'two'}), ('sched1', 'yes')])

>>> s2._fields
{'my_name': <__main__.AlternateValueField object at 0x7fd37fbaf1c0>, 'sched1': <mongoengine.fields.StringField object at 0x7fd37f968250>, 'id': <mongoengine.base.fields.ObjectIdField object at 0x7fd378225640>}

>>> s2._data
{'id': None, 'my_name': {'_active': 'one', '_default': 'two'}, 'sched1': 'yes'}

>>> s2.my_name = 'anotherone'  # works due to the __set__() implantation above

>>> s2.my_name
'anotherone'

>>> s2._data
{'id': None, 'my_name': {'_active': 'anotherone', '_default': 'two'}, 'sched1': 'yes'}

>>> s2.my_name['_active'] = 'blah'  # won't work now
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

>>> s2._data['my_name']['_active'] = 'blah'  # will work

>>> s2.my_name
'blah'

>>> s2._data
{'id': None, 'my_name': {'_active': 'blah', '_default': 'two'}, 'sched1': 'yes'}

My issue is, I would like to be able to use these two cases:

  • s2.my_name # returns the _active value or _default value, not dict of them
  • s2.my_name = 'something' # works due to the __set__() implementation
  • s2.my_name['_active'] = 'blah' # causes error as shown above

The 3rd case above is what I'd like to be able to do without causing an error.

I tried the following in an attempt to get around it:

  • overriding __get__(), which doesn't work because it's job is done prior to recognizing there is a getitem call
  • thought about overriding BaseDict.__getitem__() to recognize this, but that won't work for when there is no getitem call

I would need something after the AlternateField's __get__() call and before BaseDict's __getitem__() call to recognize whether a getitem is following it or not.

Any thoughts on best way to get around this? Obviously, the above may just all be a bad idea. Any alternatives (pardon the pun :)) on creating fields with alternative values that can be changed on a per-document basis and remembered across restarts of the software? Fields can have all the types mongoengine supports, but mostly will be string, int, and potentially, list.

bhairav13 avatar Sep 02 '22 16:09 bhairav13

To be honest I havn't read the entire description but here are 2 layout that you can assess:

class MyDocument(Document):
    system_value = StringField()
    user_value = StringField()

    @property
    def value(self):
        return self.user_value if self.user_value or self.system_value

BUT this would not work for querying (e.g MyDocument.objects(value="whatever") wouldn't work because value is not in mongodb)

Other layout with 3 fields:

class MyDocument(Document):
    system_value = StringField()
    user_value = StringField()
    value = StringField()

    def clean(): # https://docs.mongoengine.org/apireference.html#mongoengine.Document.clean
        self.value = self.user_value if self.user_value or self.system_value

This would allow querying on value and if you only save and update by going through MyDocument instances, the 'value' field should be consistent.

bagerard avatar Sep 06 '22 19:09 bagerard

Thank your for your response @bagerard. I agree with you on the first option, and had considered both ways prior to attempting my way above. The reason I was not satisfied with having the multiple fields is that the class itself would get cluttered when we start having more of these. Our Document class may typically have 8-12 of these singular value fields and that would turn out to be 16-24 (for 2 fields per value) or 24-36 (for 3 fields per value) fields that we would have to keep track of in the class itself. Seemed too much ... :)

bhairav13 avatar Sep 08 '22 00:09 bhairav13

OK indeed that's going to be a mess if you go that way.

You should be able to bake some of that behavior in an nested field (an EMbedded Document)


class AlternateField(EmbeddedDocument):
    system_value = StringField()
    user_value = StringField()
    value = StringField()

    def clean():
        self.value = self.user_value if self.user_value or self.system_value


class MyDoc(Document):
    fancy_field1 = EmbeddedDocumentField(AlternateField)
    fancy_field2 = ...
    ...

And then you would query using the nested field: MyDoc.objects(fancy_field1_value='whatever')

I havn't tested this so the code might require some adjustment but conceptually it should work

bagerard avatar Sep 08 '22 07:09 bagerard

Sorry, i have been out of town. Thank you for this pointer. I'll definitely test this out and see if it works for all the different cases I had laid out earlier. Again, much appreciated. Will respond back with my findings.

bhairav13 avatar Sep 19 '22 05:09 bhairav13

c:losing this but we can reopen later if need be

bagerard avatar Dec 28 '22 10:12 bagerard