strawberry-django icon indicating copy to clipboard operation
strawberry-django copied to clipboard

optimizer over fetching rows

Open tasiotas opened this issue 2 years ago • 12 comments

Hi,

It looks like Query Optimizer is over fetching rows when working with Relay connections. I works fine with List but not with Connections.

So when I build schema using List, query below gets executed as single SQL But when using Connection, I get two SQLs. I guess thats how it works? Or could it be "optimized" ?

strawberry-graphql-django = "^0.16.0" repro https://github.com/tasiotas/strawberry_debug

thank you

#query
query MyQuery {
  user(pk: "1") {
    products {
      edges {
        node {
          name
        }
      }
    }
  }
}
# first query
SELECT "app_user"."id"
  FROM "app_user"
 WHERE "app_user"."id" = '1'
 LIMIT 21
# second query
SELECT "app_product"."id",
       "app_product"."name",
       "app_product"."size", # <- this shouldnt be here, didnt ask for it in the query
       "app_product"."user_id"
  FROM "app_product"
 WHERE "app_product"."user_id" IN ('1')

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

tasiotas avatar Aug 16 '23 22:08 tasiotas

I think this happened because the optimizer currently is skipping any nested connections: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/main/strawberry_django/optimizer.py#L320C29-L320C29

Maybe we should allow this as long as the parent is a single result and not a list? This would make that specific case you are reporting work, but you would still have that same issue if you were retrieving a list/connection of users and then a connection of products inside those

bellini666 avatar Aug 18 '23 21:08 bellini666

I see. Seems like a complete solution might take a while to figure out. Sure, if you don't mind patching my case with that workaround, I will gladly take it. Nested connections are rare, but definitely can happen.

Thanks

tasiotas avatar Aug 18 '23 23:08 tasiotas

@tasiotas ,

I was taking a look at this now and I'm a bit confused. The optimizer should actually be working for this because the products resolver would return a queryset, and that queryset would get optimized.

Can you show me how your python code is written for this type so I can try to reproduce it?

bellini666 avatar Aug 24 '23 19:08 bellini666

sure, take a look at this repro https://github.com/tasiotas/strawberry_debug

tasiotas avatar Aug 24 '23 19:08 tasiotas

@tasiotas oh sorry, I did not see the link

The first thing I noticed was that you passed a prefetch_related in the products. Before I can test it, can you try it without the prefetch_related (i.e. type it as products: ListConnectionWithTotalCount[ProductType] = strawberry.django.connection() only)?

The reason I say this is because you don't want to prefetch_related a connection, as it will be sliced and the prefetched data will be thrown away (so it is even worse performance-wise). This can also be interfering in the optimizer since it will skip already prefetched querysets

bellini666 avatar Aug 24 '23 20:08 bellini666

sure, without prefetch_related I am getting 3 SQL queries

SELECT "app_user"."id"
  FROM "app_user"
 WHERE "app_user"."id" = '1'
 LIMIT 21
SELECT "app_product"."id",
       "app_product"."name"
  FROM "app_product"
 WHERE "app_product"."user_id" = '1'
 LIMIT 101
SELECT "app_product"."id",
       "app_product"."user_id"
  FROM "app_product"
 WHERE "app_product"."id" = '1'
 LIMIT 21

Is it possible to get just one query?

tasiotas avatar Aug 24 '23 20:08 tasiotas

1 query is impossible, but 2 is what should be happening. One for the user and another one for the products list.

Now there seems to be an issue with the query optimizer here. The second query is the relay's connection one, and it should have selected the user_id as well as you can see in this condition. I think it did not do that because it actually doesn't know it is being retrieved using a relation through the user. I'll see what I can do about optimizing that

bellini666 avatar Aug 24 '23 21:08 bellini666

Hi @bellini666,

Do you think you will have time and energy to look into optimizer related issues this year? It's one of the last issues I need to address in my backend.

Currently I have few queries with 3 or even 4 level deep nested connections. Such query takes seconds to resolve without prefetching data.

Or is there a workaround for such cases? Can we disable optimizer and write query manually? I'm not sure it that's is even feasible solution.

Thank you! 🙏

tasiotas avatar Jan 04 '24 05:01 tasiotas

Hey @tasiotas ,

I'm hoping to have the time and energy to look into it ASAP. If I were to give you an estimated ETA I would say probably in this years Q1 still (but more towards the end of it) :)

bellini666 avatar Jan 06 '24 15:01 bellini666

Glad to hear that, thank you.

tasiotas avatar Jan 07 '24 02:01 tasiotas

@tasiotas i'm running into this exact same scenario. Would be very helpful if you could share any workarounds you're using.

rsomani95 avatar Feb 09 '24 20:02 rsomani95

I don't have any. Luckily I can wait for solution from @bellini666

tasiotas avatar Feb 09 '24 20:02 tasiotas