umongo
umongo copied to clipboard
Add projection feature to Reference.fetch()
As proposed in this comment.
I would try to have fetch signature close to find/find_one signatures, for consistency. Something like this, so we could pass projection or any other find_one
argument:
def fetch(self, *args, force_reload=False, **kwargs):
if not self._document or force_reload:
if self.pk is None:
raise ReferenceError('Cannot retrieve a None Reference')
self._document = self.document_cls.find_one(self.pk, *args, **kwargs)
if not self._document:
raise ValidationError(self.error_messages['not_found'].format(
document=self.document_cls.__name__))
return self._document
There's a trap, here, though, due to the caching mechanism.
Just like you can load partial documents, and when you do that you should be aware that some fields are not loaded, you could partially fetch and when doing so you should be aware of the partial load when accessing fetched document.
The trick is that validation of reference fields when saving referencer does a fetch() as well, and for performance reasons, I'd really like to use a projection here, but this populates the cache behind our back with an almost empty document.
This would not happen if we did not use fetch in io_validate
:
def _reference_io_validate(field, value):
if value.document_cls.find_one(value.pk, projection={'_id': True}) is None:
raise ValidationError(value.error_messages['not_found'].format(
document=value.document_cls.__name__))
In fact, this would make validation more efficient, I suppose, because we would not instantiate every referenced Document when saving referencer.
This code is pretty much what I suggested to put in an exists
method of Reference. You said you didn't like the idea ("too low level").
class PyMongoReference(Reference):
@property
def exists(self):
return self.document_cls.find_one(self.pk, projection={'_id': True}) is not None
def _reference_io_validate(field, value):
if not value.exists:
raise ValidationError(value.error_messages['not_found'].format(
document=value.document_cls.__name__))
I thing I'm beginning to see what you meant by "too low level". Should we have some sort of cook_find_projection
function in tools.py
? AFAIU, we can currently pass a projection
argument to find_one
but we should feed it DB names, not OO names, right?
This code is pretty much what I suggested to put in an exists method of Reference. You said you didn't like the idea ("too low level").
Well you're proving me there is a valid usecase ;-)
I'm thinking about another aproach for the fetch cache trouble : we could keep the fetched data as a dict with the projection as key. This way the almost-empty fetch done in _reference_io_validate
wouldn't mess with the next fetchs done by the user (if we do this, we would probably need a cached=True
parameter in fetch
to let the user being able to avoid ending up with a huge cache of multiple projections in the case of a long-lived object)
What do you think ?
Regarding the cache, I think my proposal is more in line with partial Document load: once a Document is loaded partial, you should know it and reload if you want more fields. So in fetch
, I'd let the user manage projections. And stay away from the fetch cache when just checking existence.
I'm a bit scared by the cache-all-projections approach, although in practice, I doubt the cache would explode.
I'm realizing that projection is a bit of a hole in the framework. Currently, it is possible in find
because kwargs are blindly passed to collection.find
. This means that projection keys must be passed as mongo names (attribute), not object names. Maybe uMongo could provide a layer here, like cook_find_filter
does for find keys.
Also, to stay in line with find
, we could pass kwargs in reload
, to allow projection or other options in there as well. (We could be tempted to remember the projection used in find
and offer to reapply it in reload
, but knowing the internals, this is just asking for trouble, so let's keep simple and let the caller know and remember what he wants.)