umongo icon indicating copy to clipboard operation
umongo copied to clipboard

Add projection feature to Reference.fetch()

Open lafrech opened this issue 7 years ago • 2 comments

As proposed in this comment.

I would try to have fetch signature close to find/find_one signatures, for consistency. Something like this, so we could pass projection or any other find_one argument:

    def fetch(self, *args, force_reload=False, **kwargs):
        if not self._document or force_reload:
            if self.pk is None:
                raise ReferenceError('Cannot retrieve a None Reference')
            self._document = self.document_cls.find_one(self.pk, *args, **kwargs)
            if not self._document:
                raise ValidationError(self.error_messages['not_found'].format(
                    document=self.document_cls.__name__))
        return self._document

There's a trap, here, though, due to the caching mechanism.

Just like you can load partial documents, and when you do that you should be aware that some fields are not loaded, you could partially fetch and when doing so you should be aware of the partial load when accessing fetched document.

The trick is that validation of reference fields when saving referencer does a fetch() as well, and for performance reasons, I'd really like to use a projection here, but this populates the cache behind our back with an almost empty document.

This would not happen if we did not use fetch in io_validate:

def _reference_io_validate(field, value):
    if value.document_cls.find_one(value.pk, projection={'_id': True}) is None:
        raise ValidationError(value.error_messages['not_found'].format(
            document=value.document_cls.__name__))

In fact, this would make validation more efficient, I suppose, because we would not instantiate every referenced Document when saving referencer.

This code is pretty much what I suggested to put in an exists method of Reference. You said you didn't like the idea ("too low level").

class PyMongoReference(Reference):

    @property
    def exists(self):
        return self.document_cls.find_one(self.pk, projection={'_id': True}) is not None

def _reference_io_validate(field, value):
    if not value.exists:
        raise ValidationError(value.error_messages['not_found'].format(
            document=value.document_cls.__name__))

I thing I'm beginning to see what you meant by "too low level". Should we have some sort of cook_find_projection function in tools.py? AFAIU, we can currently pass a projection argument to find_one but we should feed it DB names, not OO names, right?

lafrech avatar Mar 07 '17 09:03 lafrech

This code is pretty much what I suggested to put in an exists method of Reference. You said you didn't like the idea ("too low level").

Well you're proving me there is a valid usecase ;-) I'm thinking about another aproach for the fetch cache trouble : we could keep the fetched data as a dict with the projection as key. This way the almost-empty fetch done in _reference_io_validate wouldn't mess with the next fetchs done by the user (if we do this, we would probably need a cached=True parameter in fetch to let the user being able to avoid ending up with a huge cache of multiple projections in the case of a long-lived object)

What do you think ?

touilleMan avatar Mar 12 '17 17:03 touilleMan

Regarding the cache, I think my proposal is more in line with partial Document load: once a Document is loaded partial, you should know it and reload if you want more fields. So in fetch, I'd let the user manage projections. And stay away from the fetch cache when just checking existence.

I'm a bit scared by the cache-all-projections approach, although in practice, I doubt the cache would explode.

I'm realizing that projection is a bit of a hole in the framework. Currently, it is possible in find because kwargs are blindly passed to collection.find. This means that projection keys must be passed as mongo names (attribute), not object names. Maybe uMongo could provide a layer here, like cook_find_filter does for find keys.

Also, to stay in line with find, we could pass kwargs in reload, to allow projection or other options in there as well. (We could be tempted to remember the projection used in find and offer to reapply it in reload, but knowing the internals, this is just asking for trouble, so let's keep simple and let the caller know and remember what he wants.)

lafrech avatar Mar 12 '17 19:03 lafrech