Supporting Alternate Values for MongoDB fields
Hello,
I have a somewhat different need/requirement in how I use mongoengine and wanted to share the below to get any ideas on how to deal with it.
My requirement is that for some fields in a document, I want to be able to keep two values. One is a system-default value and the other a user-specified value. When accessing this field, it should return back a value in the following manner:
- if a user-specified value exists, return that OR
- if a user-specified value doesn’t exist, return the default
I can not use the default attribute that mongoengine already provides because:
- It is applied to the mongoengine class model rather than the mongodb document (defaults can be different from a document-to-document basis)
- The default value would need to be updated based on system changes in newer versions of the software
- There would not be a memory of what the default is at the document level so it would be difficult to determine if the default needs to be changed for certain documents only
I thought of creating something called an AlternateValueField, which is based off a MapField. Below is the code for this new field type:
import mongoengine
class AlternateValueField(mongoengine.MapField):
def __init__(self, field=None, *args, **kwargs):
self.allowed_keys = (
'_active', # Active Value - changeable by user
'_default', # Default Value - provided by system
)
# Remove the field's default since it now gets handled
# as _default
self.kwargs_default = kwargs.pop('default', None)
super().__init__(field=field, *args, **kwargs)
def __get__(self, instance, owner):
result = super().__get__(instance, owner)
if isinstance(result, dict):
if '_active' in result.keys():
return result['_active']
if '_default' in result.keys():
return result['_default']
return result
def __set__(self, instance, value):
instance_dict = instance._data.get(self.name)
if isinstance(value, dict):
if isinstance(instance_dict, dict):
# To keep the order (_active followed by _default)
new_value = {}
for key in self.allowed_keys:
if (key in instance_dict.keys()
and key not in value.keys()):
new_value[key] = instance_dict[key]
elif key in value.keys():
new_value[key] = value[key]
value = new_value
elif value is not None:
new_value = {
'_active': value,
}
if isinstance(instance_dict, dict):
if '_default' in instance_dict.keys():
new_value['_default'] = instance_dict['_default']
value = new_value
else:
if self.required and self.default is not None:
new_value = {
'_default': self.kwargs_default,
}
value = new_value
self._check_for_allowed_keys(value)
super().__set__(instance, value)
def _check_for_allowed_keys(self, value):
if value is None:
return
for key in value.keys():
if key in self.allowed_keys:
continue
err_msg = (
f"Key '{key}' is not allowed."
f" Allowed keys: {self.allowed_keys}"
)
raise KeyError(err_msg)
def validate(self, value, **kwargs):
super().validate(value, **kwargs)
self._check_for_allowed_keys(value)
A document example that uses it:
class MySchedule(mongoengine.Document):
my_name = AlternateValueField(mongoengine.StringField())
sched1 = mongoengine.StringField()
Using this model to create a document:
>>> s2 = MySchedule(my_name={'_active': 'one', '_default': 'two'}, sched1='yes')
>>> s2.validate()
>>> s2.my_name # returns just the value (of active or default) instead of dict
'one'
>>> s2.to_json()
'{"my_name": {"_active": "one", "_default": "two"}, "sched1": "yes"}'
>>> s2.to_mongo()
SON([('my_name', {'_active': 'one', '_default': 'two'}), ('sched1', 'yes')])
>>> s2._fields
{'my_name': <__main__.AlternateValueField object at 0x7fd37fbaf1c0>, 'sched1': <mongoengine.fields.StringField object at 0x7fd37f968250>, 'id': <mongoengine.base.fields.ObjectIdField object at 0x7fd378225640>}
>>> s2._data
{'id': None, 'my_name': {'_active': 'one', '_default': 'two'}, 'sched1': 'yes'}
>>> s2.my_name = 'anotherone' # works due to the __set__() implantation above
>>> s2.my_name
'anotherone'
>>> s2._data
{'id': None, 'my_name': {'_active': 'anotherone', '_default': 'two'}, 'sched1': 'yes'}
>>> s2.my_name['_active'] = 'blah' # won't work now
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>> s2._data['my_name']['_active'] = 'blah' # will work
>>> s2.my_name
'blah'
>>> s2._data
{'id': None, 'my_name': {'_active': 'blah', '_default': 'two'}, 'sched1': 'yes'}
My issue is, I would like to be able to use these two cases:
- s2.my_name # returns the _active value or _default value, not dict of them
- s2.my_name = 'something' # works due to the
__set__()implementation - s2.my_name['_active'] = 'blah' # causes error as shown above
The 3rd case above is what I'd like to be able to do without causing an error.
I tried the following in an attempt to get around it:
- overriding
__get__(), which doesn't work because it's job is done prior to recognizing there is a getitem call - thought about overriding
BaseDict.__getitem__()to recognize this, but that won't work for when there is no getitem call
I would need something after the AlternateField's __get__() call and before BaseDict's __getitem__() call to recognize whether a getitem is following it or not.
Any thoughts on best way to get around this? Obviously, the above may just all be a bad idea. Any alternatives (pardon the pun :)) on creating fields with alternative values that can be changed on a per-document basis and remembered across restarts of the software? Fields can have all the types mongoengine supports, but mostly will be string, int, and potentially, list.
To be honest I havn't read the entire description but here are 2 layout that you can assess:
class MyDocument(Document):
system_value = StringField()
user_value = StringField()
@property
def value(self):
return self.user_value if self.user_value or self.system_value
BUT this would not work for querying (e.g MyDocument.objects(value="whatever") wouldn't work because value is not in mongodb)
Other layout with 3 fields:
class MyDocument(Document):
system_value = StringField()
user_value = StringField()
value = StringField()
def clean(): # https://docs.mongoengine.org/apireference.html#mongoengine.Document.clean
self.value = self.user_value if self.user_value or self.system_value
This would allow querying on value and if you only save and update by going through MyDocument instances, the 'value' field should be consistent.
Thank your for your response @bagerard. I agree with you on the first option, and had considered both ways prior to attempting my way above. The reason I was not satisfied with having the multiple fields is that the class itself would get cluttered when we start having more of these. Our Document class may typically have 8-12 of these singular value fields and that would turn out to be 16-24 (for 2 fields per value) or 24-36 (for 3 fields per value) fields that we would have to keep track of in the class itself. Seemed too much ... :)
OK indeed that's going to be a mess if you go that way.
You should be able to bake some of that behavior in an nested field (an EMbedded Document)
class AlternateField(EmbeddedDocument):
system_value = StringField()
user_value = StringField()
value = StringField()
def clean():
self.value = self.user_value if self.user_value or self.system_value
class MyDoc(Document):
fancy_field1 = EmbeddedDocumentField(AlternateField)
fancy_field2 = ...
...
And then you would query using the nested field:
MyDoc.objects(fancy_field1_value='whatever')
I havn't tested this so the code might require some adjustment but conceptually it should work
Sorry, i have been out of town. Thank you for this pointer. I'll definitely test this out and see if it works for all the different cases I had laid out earlier. Again, much appreciated. Will respond back with my findings.
c:losing this but we can reopen later if need be