django icon indicating copy to clipboard operation
django copied to clipboard

Refs #28455 - Implement a method by which querysets may internally be opted out of cloning operations.

Open kezabelle opened this issue 2 years ago • 3 comments

Picking up a fairly old ticket, 28455 to see if we can get discussion/implementation going again. Very much WIP. No tests added, none of the extras from the original branches, just a minimal API change to examine (+ a sample usage thereof) and iterate on. If all the tests pass, I'll consider that an OK start :p

kezabelle avatar Jul 20 '21 15:07 kezabelle

@kezabelle Do you have time to keep working on this?

felixxm avatar Oct 21 '21 06:10 felixxm

Yeah, I'm happy to do so. It's mostly still marked as WIP/needs improvement to get eyes on it to decide on how/if it should be implemented (e.g: use a @contextmanager decorator or a custom object? variables named ok?) and if there's caveats/bits I've missed considering. Conceptually it's not difficult ("just don't clone"), but it's the ORM so there's always the possibility of hairy bits.

I'll fix the conflicts that now exist, and tidy up the flake8, and go from there.

kezabelle avatar Oct 21 '21 09:10 kezabelle

TL;DR: I've tidied it up, and it's ready to be discussed and/or independently benchmarked/profiled. There are example usages of the API in addition to the API itself.


Alright so, here's where this now stands:

For reasons unfathomable to me, CI has only run 2 builds. The others should pass, I believe/hope, but they're MIA. I've chopped it into a number of commits dealing with one 'bit' at a time, not all of which have to be merged, but overall demonstrate the API available and how we might work with it. I'll outline those below, along with performance numbers and method call counts.

Things that need discussion or looking at:

  • The API itself (use a context manager, or don't? etc)
    • Whether that API uses contextlib or a custom context manager (see below)
    • The naming of the API methods/attributes/etc introduced. Are they accurate, stylistically correct etc? Totally open to opinions.
  • Do the additional couple of tests, + those already running in the suite, cover enough, or are there things I need to add?

37ddd97d0aad08e87f28aa8eb3292a0a61476fdd (first commit) implements the proposed API, using contextlib.contextmanager. 9be13a26e6aed7f354884de6a07e2f046e604ff4 (fourth commit) then migrates that same API to a custom context manager for performance. Obviously I'm leaning towards the faster of the two, but if the concession has to be made to use contextmanager it's still a worthwhile change AFAIK.

5a7a3d39654e7ac5e204ea97d07d57966cf048d3 (third commit) removes a number of clone operations on standard prefetch_related usage (those not using a Prefetch); a minimum of a 1 per queryset, or 2 if _db has a value (it's always 2 for GenericRelatedObjectManager as it's implementation is minorly different). This commit demonstrates the intended API using the context manager. This is substantially what I want to land.

7311c2403bd120418e94272aca82dff96fb1a59e (last commit) uses the other API (used by the context manager) to remove either 1 or 2 clones when calling get_prefetch_queryset if the queryset was created internally by that method (and thus can be safely toggled without bleeding the value's state to an outer queryset). This would probably be nice to land, but I've only tacked it on today to demonstrate this usage of the API, so YMMV.


Now to attempt to briefly cover the performance. This is all slightly different than on the ticket, not least because a couple of other changed have merged since I originally implemented this.

On main as of dd528cb2cefc0db8b91a7ff0a2bc87305b976597, using a local SQLite database of 1,000 User objects, each with 0 groups and 0 user_permissions, on a Mac laptop with 86% battery:

In [3]: %timeit -n 100 -r 5 tuple(User.objects.prefetch_related('groups', 'user_permissions'))
90.8 ms ± 1.31 ms per loop (mean ± std. dev. of 5 runs, 100 loops each)

(note: on previous runs I've seen this as low as 86ms)

Same database, using the contextlib.contextmanager and only the prefetch_related commit (i.e: executed against 5a7a3d39654e7ac5e204ea97d07d57966cf048d3):

In [2]: %timeit -n 100 -r 5 tuple(User.objects.prefetch_related('groups', 'user_permissions'))
76.6 ms ± 1.13 ms per loop (mean ± std. dev. of 5 runs, 100 loops each)

(note: on previous runs I've seen this at 79ms, so this is a goodish run for my laptop)

Same database, using the custom context manager (i.e. executed against 9be13a26e6aed7f354884de6a07e2f046e604ff4):

In [2]: %timeit -n 100 -r 5 tuple(User.objects.prefetch_related('groups', 'user_permissions'))
72.4 ms ± 851 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)

(note: on previous runs using -n10 I've seen this get to 71ms)

and finally, at the last commit, 7311c2403bd120418e94272aca82dff96fb1a59e:

73.3 ms ± 1.18 ms per loop (mean ± std. dev. of 5 runs, 100 loops each

Yeah that last one doesn't look great. I can't tell whether it's actually worse or if the computer is working against me, but I include it out of honest completeness. I have seen shorter runs do better (67.5 ms ± 692 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)) but I can't currently say for certain that the last commit is a net improvement; TBF I think it's only called once per prefetch (so e.g it's called twice for groups, user_permissions) so it's probably just CPU workload flux.


And now to look at the number of calls. These are for 1 iteration of tuple(User.objects.prefetch_related('groups', 'user_permissions')) (1000 users).

Baseline on main; pertinent methods only:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2007    0.010    0.000    0.019    0.000 query.py:292(clone)
     2007    0.005    0.000    0.028    0.000 query.py:1342(_clone)
     2000    0.004    0.000    0.061    0.000 related_descriptors.py:921(get_queryset)
     2000    0.004    0.000    0.041    0.000 related_descriptors.py:905(_apply_rel_filters)
     2007    0.002    0.000    0.030    0.000 query.py:1331(_chain)
     2007    0.002    0.000    0.021    0.000 query.py:340(chain)
     2007    0.002    0.000    0.004    0.000 where.py:142(clone)

Using the contextlib.contextmanager + patched apply_rel_filters (i.e: at commit 5a7a3d39654e7ac5e204ea97d07d57966cf048d3)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2000    0.007    0.000    0.024    0.000 related_descriptors.py:906(_apply_rel_filters)
     2000    0.004    0.000    0.043    0.000 related_descriptors.py:923(get_queryset)
     2003    0.002    0.000    0.020    0.000 contextlib.py:121(__exit__)
     2003    0.002    0.000    0.003    0.000 contextlib.py:86(__init__)
     2003    0.001    0.000    0.003    0.000 contextlib.py:112(__enter__)
     2003    0.001    0.000    0.004    0.000 contextlib.py:242(helper)
     4000    0.001    0.000    0.002    0.000 query.py:1341(_avoid_cloning)
     2007    0.001    0.000    0.001    0.000 query.py:1349(_chain)
        7    0.000    0.000    0.000    0.000 query.py:292(clone)
        7    0.000    0.000    0.000    0.000 query.py:1363(_clone)
        7    0.000    0.000    0.000    0.000 where.py:142(clone)
        7    0.000    0.000    0.000    0.000 query.py:340(chain)
     2000    0.000    0.000    0.000    0.000 query.py:1333(_disable_cloning)
     2000    0.000    0.000    0.000    0.000 query.py:1337(_enable_cloning)

Replacing the contextlib.contextmanager with the custom one:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2000    0.006    0.000    0.018    0.000 related_descriptors.py:906(_apply_rel_filters)
     2000    0.004    0.000    0.039    0.000 related_descriptors.py:923(get_queryset)
     2000    0.001    0.000    0.001    0.000 query.py:1373(_avoid_cloning)
     2007    0.001    0.000    0.001    0.000 query.py:1384(_chain)
     2000    0.001    0.000    0.001    0.000 query.py:190(__enter__)
     2000    0.001    0.000    0.001    0.000 query.py:193(__exit__)
     2000    0.000    0.000    0.000    0.000 query.py:187(__init__)
     2000    0.000    0.000    0.000    0.000 query.py:1354(_disable_cloning)
     2000    0.000    0.000    0.000    0.000 query.py:1363(_enable_cloning)
        7    0.000    0.000    0.000    0.000 query.py:340(chain)
        7    0.000    0.000    0.000    0.000 query.py:292(clone)
        7    0.000    0.000    0.000    0.000 query.py:1402(_clone)
        7    0.000    0.000    0.000    0.000 where.py:142(clone)

and then finally, the counts which change if also updating get_prefetch_queryset (i.e: last commit, optional, example of the less ergonomic API):

     2002    0.000    0.000    0.000    0.000 query.py:1354(_disable_cloning)
     2002    0.000    0.000    0.000    0.000 query.py:1363(_enable_cloning)
        1    0.000    0.000    0.000    0.000 query.py:292(clone)
        1    0.000    0.000    0.000    0.000 query.py:1402(_clone)
        1    0.000    0.000    0.000    0.000 where.py:142(clone)
        1    0.000    0.000    0.000    0.000 query.py:340(chain

If any of the above doesn't make sense or looks amiss, do say. It's taken me hours to get all the ducks in order, so I'm sure there's ambiguity or downright incorrectness.

kezabelle avatar Nov 22 '21 17:11 kezabelle