django-model-utils icon indicating copy to clipboard operation
django-model-utils copied to clipboard

select_sublcasses does not work with prefetch_related

Open Anislav opened this issue 11 years ago • 8 comments
trafficstars

Consider having the following example models:

from django.db import models
from model_utils.managers import InheritanceManager

class Post(models.Model):
    creation_date = models.DateTimeField(auto_now_add=True)
    objects = InheritanceManager()

class TextPost(Post):
    body = models.TextField()

class Photo(models.Model):
    image = models.ImageField(upload_to='photos')

class PhotoPost(Post):
    photos = models.ManyToManyField(Photo)

When I attempt to select all posts and prefetch PhotoPost's photos:

Post.objects.select_subclasses().prefetch_related('photos').all()

The following error is raised:

    (0.001) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_textpost`.`post_ptr_id`, `photos_textpost`.`body`, `photos_photopost`.`post_ptr_id` FROM `photos_post` LEFT OUTER JOIN `photos_textpost` ON (`photos_post`.`id` = `photos_textpost`.`post_ptr_id`) LEFT OUTER JOIN `photos_photopost` ON (`photos_post`.`id` = `photos_photopost`.`post_ptr_id`) LIMIT 21; args=()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 102, in __iter__
    len(self)
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 94, in __len__
    self._prefetch_related_objects()
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 613, in _prefetch_related_objects
    prefetch_related_objects(self._result_cache, self._prefetch_related_lookups)
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 1742, in prefetch_related_objects
    (attr, first_obj.__class__.__name__, lookup))
AttributeError: Cannot find 'photos' on TextPost object, 'photos' is an invalid parameter to prefetch_related()

Is prefetch_related supported in combination with select_sublcasses? Note that I have tested this on Django 1.5 and 1.6 with the latest release of django-model-utils.

Disclaimer: the given example is reworked version of the following one: http://stackoverflow.com/questions/19511368/prefetch-related-with-django-1-5-django-model-utils

Anislav avatar Feb 03 '14 11:02 Anislav

I wouldn't be entirely surprised to find that the InheritanceManager doesn't play nicely with prefetch_related; there are no testcases which demonstrate whether it should or should not work that I can see, and to the best of my understanding of how prefetch_related works, it makes certain assumptions, one of which is that the _result_cache is a homogenous set of objects, which is at odds with what InheritanceManager is all about.

I assume from your opening the ticket that .prefetch_related('photopost__photo') doesn't do the expected thing either?

kezabelle avatar Feb 04 '14 08:02 kezabelle

If prefetch_related cannot be supported, or has caveats in how it must be used, we should probably clarify that position in the documentation, and in test cases which are marked as @expectedFailure where appropriate.

kezabelle avatar Feb 04 '14 08:02 kezabelle

Yes, it does not work in the expected by me way when .prefetch_related('photopost__photo') is used. In this case too many SQL queries are executed which is not optimal. Look at the samples below:

When I execute the following:

posts = list(Post.objects.select_subclasses().prefetch_related('photopost__photos').all())

The following 3 SQL queries are executed:

(0.025) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_textpost`.`post_ptr_id`, `photos_textpost`.`body`, `photos_photopost`.`post_ptr_id` FROM `photos_post` LEFT OUTER JOIN `photos_textpost` ON (`photos_post`.`id` = `photos_textpost`.`post_ptr_id`) LEFT OUTER JOIN `photos_photopost` ON (`photos_post`.`id` = `photos_photopost`.`post_ptr_id`); args=()
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:01,155
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.008) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_photopost`.`post_ptr_id` FROM `photos_photopost` INNER JOIN `photos_post` ON (`photos_photopost`.`post_ptr_id` = `photos_post`.`id`) WHERE `photos_photopost`.`post_ptr_id` IN (1, 2, 3); args=(1, 2, 3)
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:01,157
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.000) SELECT (`photos_photopost_photos`.`photopost_id`) AS `_prefetch_related_val`, `photos_photo`.`id`, `photos_photo`.`image` FROM `photos_photo` INNER JOIN `photos_photopost_photos` ON (`photos_photo`.`id` = `photos_photopost_photos`.`photo_id`) WHERE `photos_photopost_photos`.`photopost_id` IN (2, 3); args=(2, 3)

It looks like the first query finds that there are Posts with ids 1, 2 & 3. Then the second one finds that only those with ids: 2 & 3 are of PhotoPost type. The last query selects the Photos for all PhotoPosts. Although, the Photos are retrieved it seems that they are not cached because when I try to retrieve the Photos for one of the PhotoPosts, a new SQL query is executed:

>>> posts[1].photos.all()
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:32,993
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.001) SELECT `photos_photo`.`id`, `photos_photo`.`image` FROM `photos_photo` INNER JOIN `photos_photopost_photos` ON (`photos_photo`.`id` = `photos_photopost_photos`.`photo_id`) WHERE `photos_photopost_photos`.`photopost_id` = 2  LIMIT 21; args=(2,)
[<Photo: Photo object>, <Photo: Photo object>]

Anislav avatar Feb 04 '14 08:02 Anislav

The fact they're not cached may, tentatively be related to the solution proposed in #78.

kezabelle avatar Feb 04 '14 09:02 kezabelle

Probably, you are right regarding the Photos cache part of the problem.

Anyway, executing the following

posts = list(Post.objects.select_subclasses().prefetch_related('photopost__photos').all())

results in 3 SQL queries which is an issue from my point of view.

Anislav avatar Feb 04 '14 09:02 Anislav

Hi @Anislav, thanks for the report. I think @kezabelle is likely correct that #78 would fix the reverse-related cache, which seems like part of the fix needed here.

Regarding the three queries, I'm not sure what's going on there, would need to dig into it. Probably won't have time to do that soon, but if you figure out the cause I'd be happy to take a pull request!

carljm avatar Feb 04 '14 18:02 carljm

I also bumped into this problem with prefetch_related as mentioned in https://github.com/jazzband/django-model-utils/issues/110#issuecomment-34044670

Django version: 1.11.16 django-model-utils version: 3.1.2 Python version: 3.5 Windows machine

Looks like the prefetch cache is not where one would expect it. With Post.objects.select_subclasses().prefetch_related('photopost__photos').all() we would expect all Post objects to have prefetched their PhotoPosts and their Photos. So looping over all PhotoPosts (as returned by select_subclasses) we would expect their Photos to be prefetched and photopost_instance.photos.all() not to result in extra queries. However, the actual prefetch result seems to be stored in photopost_instance.photoposts.photos.all() (so with the extra photoposts in there). Using this last way of accessing the photos does not result in extra queries, but it is a bit weird to have to do so.

I suspect that this happens because prefetch_related is handled after the iterator has converted the found instances to the appropriate child classes.

Unfortunately I do not have time (or the know-how) to make a PR to fix this. (I was hoping this package would be plug-and-play to save me time. ;-) ) So for now I'm going with the weird but working fix of using the extra classname when looking up the prefetched objects. I just wanted to make it known that the bug still exists and what might be causing it.

BrendaH avatar Oct 03 '18 15:10 BrendaH

I've bumped into this as well. From what I can see, this is hard to fix.

As workaround, you could override QuerySet._prefetch_related_objects() to implement some manual prefetching. In example, split the subclasses by model type, and perform manual prefetches for each of them.

vdboor avatar Jan 23 '24 13:01 vdboor