graphene-django icon indicating copy to clipboard operation
graphene-django copied to clipboard

`select_related` and `prefetch_related`

Open khanetor opened this issue 7 years ago • 38 comments

When I inspected the actual SQL queries being run against DB, I found that for foreignkey and manytomany relationships, we have the n+1 problem.

For example, suppose there are 5 teams, each with 11 members, this query:

query {
  teams {
    edges {
      node {
        members {
          edges {
            node {
              name
            }
          }
        }
      }
    }
  }
}

will result in 1 query for selecting teams, and 5 more queries for selecting members of each team. 6 queries in total.

Normally, we would do a query like this Team.objects.all().prefetch_related('members') to reduce to 2 queries.

I think it would be extremely beneficial if DjangoObjectType can detect special fields:

  • ForeignField
  • OneToOneField
  • OneToManyField
  • ManyToManyField and apply appropriate prefetch_related and/or select_related when such fields are present in graph queries.

khanetor avatar Nov 13 '16 15:11 khanetor

Is anyone currently working on this?

jaredkwright avatar Jan 27 '17 22:01 jaredkwright

I noticed in the docs on filtering with django_graphene you can define a resolve_xxxx function that returns a queryset. The example they provide is about filtering, but I think you could apply the select_related and prefetch_related modifications here as well.

from graphene import ObjectType
from graphene_django.filter import DjangoFilterConnectionField
from .models import Teams

class Query(ObjectType):
    teams = DjangoFilterConnectionField(CategoryNode)

    def resolve_teams(self, args, info):
        return Team.objects.all().prefetch_related('members')

http://docs.graphene-python.org/projects/django/en/latest/authorization/#queryset-filtering-on-lists

PeteAndersen avatar Feb 01 '17 00:02 PeteAndersen

I have some code which does some intelligent selective application of database optimisations. This version was written against graphene 0.X and is presented without any kind of guarantee. As and when I miraculously have time to work on it, I'd love to formalise this and get it working in a general case, but I've been saying that for about 9 months now so don't hold out hope.

The core idea is to add prefetch_FOO and optimize_FOO methods to your DjangoObjectType. The code will always do prefetch_related, even on FKs, and that will be applied automatically if applicable. The prefetched qs can be customised with the prefetch_FOO method, and further customised by optimize_FOO methods on the next DjangoObjectType. All optimisations are applied selectively - they won't be applied if the graphql query does not require that data.

Note that it also makes use of https://github.com/photocrowd/django-cursor-pagination

I'm in the middle of trying to do a graphene 1.0 update on a fairly large and complex API which hit a bunch of internals, it remains to be seen how well this approach holds up with graphene 1.0. However, I hope it is a somewhat useful start for anyone wanting to work on this.

https://gist.github.com/mjtamlyn/e8e03d78764552289ea4e2155c554deb

mjtamlyn avatar Feb 01 '17 10:02 mjtamlyn

I've updated the patterns in my gist to make more sense with Graphene 2.0. It also now includes a few additional features including manual post-processing functions and nested fields. This code with a health warning - it definitely makes my API faster because it saves thousands of queries, however I think it may also be the source of some raw python performance issues which is preventing further optimisation of the code. I'm also not yet running it in production, and it is only tested on Python 3.

mjtamlyn avatar Dec 11 '17 12:12 mjtamlyn

Hi all! I've been working on a library that tries to optimize the query by analyzing the gql query:

https://github.com/tfoxy/graphene-django-optimizer

Feel free to open issues if your use case doesn't work.

tfoxy avatar Jun 10 '18 21:06 tfoxy

I think graphene-django-optimizer is something we might want to mention in the docs.

zbyte64 avatar Oct 19 '18 17:10 zbyte64

@tfoxy thank you 🙇‍♂️! @zbyte64 mind adding a section in the docs about graphene-django-optimizer

mvanlonden avatar Mar 19 '19 15:03 mvanlonden

Gj, everyone.

Just a quick question: is there a performance drawback if I simply do a prefetch_related on any FK fields in my model?

Because I just tried graphene-django-optimizer, it seems to be breaking DjangoFilterConnectionField, so I have to customize resolve methods.

It seems both DjangoFilterConnectionField and graphene-django-optimizer offer their own resolvers. Hmm..... @tfoxy

update:

It kind of did work with DjangoFilterConnectionField if I provide a **kwargs placeholder. In your documentation I guess this was lacking

class Query:
    def resolve_listings(self, info, **kwargs):
        # this will play nice with DFCF

gotexis avatar Apr 05 '19 09:04 gotexis

@gotexis

Just a quick question: is there a performance drawback if I simply do a prefetch_related on any FK fields in my model?

You'd probably be select_related'ing your FKs on your model – prefetch_related would be for any other models that have an FK to the target model. These will both incur performance hits – select_related expands the number of joins that the database needs to perform, and the amount of data that the database needs to return to the Django app. prefetch_related increases the number of database queries that get made, one additional query for each thing being prefetched.

For small models, with few relations, and little data, this is probably fine to just do, but otherwise you may want to be smarter about when to add these queries, by using something like graphene-django-optimizer, or your own custom solution.

danpalmer avatar Apr 05 '19 12:04 danpalmer

I found this PR on Sailor https://github.com/mirumee/saleor/pull/2983/files, It might be useful, however is there any plan of having this supported by default ?

I personally tried graphene-django-optimizer and it does not solve all the problems for example: no way of prefetching when having a nested FK that has a nested reverse FK

mohamedrez avatar May 17 '19 15:05 mohamedrez

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 16 '19 16:07 stale[bot]

I think it would be still great if query optimization code could be integrated into graphene-django directly.

https://github.com/graphql-python/graphene-django/pull/693 made the situation a bit better but it doesn't really work if filters are used. For such a queryset it would need to be calculated according to the schema and using Prefetch.

sliverc avatar Jul 19 '19 07:07 sliverc

@sliverc is there any reason why you can't use the https://github.com/tfoxy/graphene-django-optimizer library? I'd like to avoid including more code into the library (that needs to be maintained) if there are alternative ways of doing it.

jkimbo avatar Jul 25 '19 17:07 jkimbo

@jkimbo First of all graphene-django-optimizer has limitations and doesn't seem to work when using DjangoFilterConnetionField or overwriting get_queryset on Node or deep nesting. And I don't see that this can easily be integrated into a separate library without ugly workarounds.

Secondly I see your point that you want to avoid adding code into the library when possible. In this case I don't see this issue as a feature though but a bug or a instability within graphene-django. Graphs are prone to query explosions and it is quite easy when using the full featureset of graphene-django that a user runs into a query explosion. Such query explosion should be avoided when possible without the user need to tediously configure a 3rd party library to avoid this (The saleor PR is a good example for this what effort is needed to get optimizer running with all its limitations)

sliverc avatar Jul 26 '19 07:07 sliverc

@sliverc I agree that “query explosions” are likely to happen however I think it is a hard problem and one that won’t necessarily be made easier by building an optimiser into the library. Having said that if you want to create a PR that adds optimisation features into graphene-django I will happily review it.

@tfoxy as author of the graphene-django-optimiser library what do you think? Should the library be integrated into graphene-django? Or are there things that we could do to make optimisations easier?

jkimbo avatar Jul 26 '19 10:07 jkimbo

graphene-django-optimizer was created as a hotfix to improve the performance of the project I was working while not making the codebase uglier (before this, we were making the optimizations at the root of every graph resolver). The code of graphene-django-optimizer is more of a hack as it uses undocumented properties from graphene-django. The code is not maintainable as it is now; I just kept adding tests and never took the time to think and refactor. If graphene changes substantially, it would be really difficult to reuse the same code. I'm no longer using Django at my current job so I lost my main drive of motivation to keep fixing it and improving it.

Maybe the best would be to find a way to make graphene more extensible so this kind of plugins can be better maintained. As I said, I'm no longer working with Django, but I can help to discuss and propose making some changes to this repo.

tfoxy avatar Jul 26 '19 18:07 tfoxy

Thanks for the reply @tfoxy . Sounds like you are right @sliverc , we should try and integrate some optimisations into graphene-django so that it is maintainable in the long run. I'm going to pick up #250 since it looks like a simple one. Do you have any time to work on adding other optimisations @sliverc ?

jkimbo avatar Jul 28 '19 10:07 jkimbo

Or @tfoxy if you’d like to merge some of your code into graphene-django that would be very helpful!

jkimbo avatar Jul 28 '19 10:07 jkimbo

@jkimbo Such query explosion should be avoided when possible without the user need to tediously configure a 3rd party library to avoid this (The [saleor PR]

You are most correct, we have to remember the end goal is to have a workable, production-ready API out of the box, or people are going to move to Node.

Maybe we should have a vote on the top priority features and I think optimization will be on top1-2. Sadly it is also one of the hardest.

The situation may be made better with the planned release of a better async core of Django and native Promise of Python. Both take time.

gotexis avatar Jul 28 '19 23:07 gotexis

@jkimbo I have posted here as stale bot was about to close this issue and I wanted avoid this and restart the discussion whether this is a valid issue.

I would love to work on this but I am very tight on time. Someone of our team might be able to work at it later on but do not know when this might happen.

But could you add pinned label to this issue that the stale bot doesn't close it again and this issue can stay on the radar?

sliverc avatar Jul 29 '19 08:07 sliverc

OK thanks @sliverc . If anyone else would like to work on adding some optimisations to graphene-django that would be great!

jkimbo avatar Aug 01 '19 07:08 jkimbo

I would like to contribute also

On Thu, 1 Aug 2019 at 08:56, Jonathan Kim [email protected] wrote:

OK thanks @sliverc https://github.com/sliverc . If anyone else would like to work on adding some optimisations to graphene-django that would be great!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/graphql-python/graphene-django/issues/57?email_source=notifications&email_token=AAJ5VKWIVY7N2SOH5FZJZELQCKJLJA5CNFSM4CWC3CXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JV6GY#issuecomment-517168923, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ5VKSD22QEOLQEQWCDCTDQCKJLJANCNFSM4CWC3CXA .

mohamedrez avatar Aug 01 '19 13:08 mohamedrez

I can contribute. I have two questions before doing anything:

  • Would these changes go into v2 or v3?
  • Would we use the same api as graphene-django-optimizer or would we have something more integrated, like for example being the default to optimize the queries?

I'm not sure when is v3 planned and if it would change much of underlaying code. As for the api, maybe there's a better way to handle things. I'm open for ideas. Maybe it would be good to have an optimize flag in the options of a node, with the possibility to set a default in the settings.py file.

tfoxy avatar Aug 01 '19 17:08 tfoxy

@tfoxy

Would these changes go into v2 or v3?

I am not expecting a huge amount of changes needed with v3 since the major breaking change is the dropping of Python 2 and the upgrade of graphql-core. I would suggest that any optimisation improvements could be based off v2.

Would we use the same api as graphene-django-optimizer or would we have something more integrated, like for example being the default to optimize the queries?

Since we're going to start building in query optimisations in the library I don't think there is any need to follow the api of graphene-django-optimizer. I think it would be great if the default would be that graphene-django optimized queries by default. Having said that if you see any breaking or potentially dangerous changes then having an opt out in settings would be great.


@tfoxy @mohamedrez I'm not sure how to break this task down into multiple steps really. @tfoxy any suggestions?

jkimbo avatar Aug 02 '19 12:08 jkimbo

Just a heads up @tfoxy @mohamedrez that I've had to give up on trying to automatically apply .only to all queries: https://github.com/graphql-python/graphene-django/issues/250#issuecomment-518006665

You might find that trying to automatically applying query optimisations has similar issues where they can actually make performance worse in some cases.

jkimbo avatar Aug 04 '19 14:08 jkimbo

I had the same problems with .only when working in graphene-django-optimizer. I decided to only apply .only when I'm sure that it will not trigger another query. That means that if there's a field on a graphql query that it's unknown, it will cancel the .only optimization for that sql query. With select_related and prefetch_related this does not happen. There was some discussion related to this: https://github.com/tfoxy/graphene-django-optimizer/issues/2 . That issue highlights the downside that it's confusing when the .only optimization is not applied in some cases.

In the following days, I will try to write about the thinking behind the implementation of graphene-django-optimizer and some suggestions about how to proceed by doing small steps.

tfoxy avatar Aug 05 '19 21:08 tfoxy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 04 '19 21:10 stale[bot]

Any update on these efforts?

easherma avatar Jan 10 '20 06:01 easherma

it would make sense reopening this

ambientlight avatar Apr 28 '20 07:04 ambientlight

Any status update here?

dvf avatar Jul 29 '20 22:07 dvf

feels like graphene has been totally abandoned unfortunately

Instrumedley avatar Jan 12 '22 18:01 Instrumedley

I reopened issue and i agree it is not resolved, but please don't assume there will be any development 🙁

ulgens avatar Jan 12 '22 19:01 ulgens