strawberry icon indicating copy to clipboard operation
strawberry copied to clipboard

Allow access to StrawberryField.metadata in a query level.

Open shmoon-kr opened this issue 5 months ago • 15 comments

  • [V ] Alteration (enhancement/optimization) of existing feature(s)

Description

I am developing a product with strawberry_graphql and strawberry_graphql_django. While developing, I need a pagination function, so I am using the pagination function provided by strawberry_graphql_django, I want to add a function that returns the total number of items together.

I think it's easy to do with the following modification by using metadata defined in strawberry graphql, The problem is that there is no way to make the values defined in self.metadata visible in the query.

Is there any way to make the metadata visible in the query?

    def apply_pagination(
        self,
        queryset: _QS,
        pagination: Optional[object] = None,
    ) -> _QS:
        self.metadata = { "total_count": queryset.count() }
        return apply(pagination, queryset)

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

shmoon-kr avatar Jan 29 '24 01:01 shmoon-kr

@bellini666 how do you handle this use case in django?

patrick91 avatar Jan 29 '24 11:01 patrick91

Another idea is to add a boolean flag to the parameters of strawberry_django.fields() that indicates a counter query.

For example, one can define a counter query like the following:

@strawberry.type
class Query:
    fruit_count: int = strawberry_django.field(count=True)

And in a filter apply resolver,

def apply(
    filters: Optional[object],
    queryset: _QS,
    info: Optional[Info] = None,
    pk: Optional[Any] = None,
    count: Bool = False,
) -> _QS:
    if pk not in (None, strawberry.UNSET):  # noqa: PLR6201
        queryset = queryset.filter(pk=pk)

    if filters in (None, strawberry.UNSET) or not has_django_definition(filters):  # noqa: PLR6201
        return queryset.count() if count else queryset

    ...

    return queryset.count() if count else queryset

Then a user can make counter function easily with the default field resolver.

I'm not familiar with the structure of the library but I think you can do easily with the above idea.

shmoon-kr avatar Jan 29 '24 12:01 shmoon-kr

Is there any way to make the metadata visible in the query?

@shmoon-kr what exactly do you mean by this? I'm not sure if I got it from your example.

bellini666 avatar Jan 29 '24 16:01 bellini666

@bellini666

Please refer to the following link: https://strawberry.rocks/docs/guides/pagination/overview

In the documentation above, there is an example of using metadata to return a total count when implementing the pagination function.

According to the above article, "It may be useful to add metadata to the result. For example, the metadata may specify how many items there are in total, so that the client knows what the greatest offset value can be." and gives the following query result example.

{
  "users": [
    ...
  ]
  "metadata": {
    "count": 25
  }
}

This is exactly what I need, but I don't have an example of how to add metadata to the results in a query resolver and how to include it in the resolver.

I checked the source code and saw that the StrawberryField class has a member variable called metadata, so I assumed I could assign it to this variable, and since the StrawberryDjangoField class inherits from the StrawberryField class, I thought I could do the same.

As a test, I tried to modify the pagination apply resolver defined in strawberry_django to include the metadata, but that didn't make the metadata appear in the query result like above. So, I'm asking how to get the metadata to appear in the query results, like the results of my query.

Thank you for your interest in this case.

--

shmoon-kr avatar Jan 29 '24 22:01 shmoon-kr

@shmoon-kr oh, I see what you mean

You can probably do something similar to what I do on strawberry-graphql-django for ListConnectionWithTotalCound: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/main/strawberry_django/relay.py#L51

In there I overridden resolve_connection to store nodes at the object, which is basically the django's QuerySet, and then I use it in the total_count resolver, which could instead return a Metadata object.

Of course you would need to change your return value and probably implement your own resolver for that.

Let me know if that points you to the right direction. Otherwise feel free to share more code in here so I can give you more advices

bellini666 avatar Feb 03 '24 15:02 bellini666

Hi all, I'm also using strawberry-django and have a similar need for pagination metadata without using Relay. My current approach is a manually resolve my list queries using a generic "wrapper" node.

@strawberry.type
class GenericListNode(Generic[T]):
    page_meta: PageMetaNode
    results: List[T]


def get_generic_list(
    root,
    info: Info,
    model: Type[Model],
    node: Type[T],
    pagination: Optional[OffsetPaginationInput] = strawberry.UNSET,
    filters = None,
    order = None,
) -> GenericListNode[T]:
    """
    Provides pagination metadata with support for Strawberry Django's built in pagination, sorting, and ordering.
    """

    # for some reason trying to set pagination as a default argument does not work
    if not pagination:
        pagination = OffsetPaginationInput(offset=0, limit=30)

    qs: List[T] = model.objects.all()

    get_queryset = getattr(node, "get_queryset", None)
    if callable(get_queryset):
        qs = node.get_queryset(qs, info)

    qs: List[T] = strawberry_django.ordering.apply(order, qs)
    qs: List[T] = strawberry_django.filters.apply(filters, qs)

    total_count = qs.count()

    qs: List[T] = strawberry_django.pagination.apply(pagination, qs)

    return GenericListNode(
        page_meta=PageMetaNode(
            total_count=total_count,
        ),
        results=qs,
    )

# Usage
def get_foos(
    root,
    info: Info,
    pagination: Optional[OffsetPaginationInput] = None,
    filters: Optional[FooFilter] = None,
    order: Optional[FooOrder] = None,
) -> GenericListNode[UserNode]:
    return get_generic_list(root, info, FooModel, Foo, pagination, filter, order)

@strawberry.type(name="Query")
class FooQuery:
    foos: GenericListNode[Foo] = strawberry.field(resolver=get_foos)

This approach isn't perfect, since you still have to write a resolver function for each query and can't take advantage of strawberry-django's automatic list resolver.

Therefore, I've been trying encapsulate this logic into a field extension. Unfortunatly, this results in a performance issues because the next_ function in resolve evaluates the entire queryset (i think due to this line). This makes queries on large tables very slow - even when applying aggressive pagination.

Extension:

class PaginationExtension(FieldExtension):
    def apply(self, field: StrawberryField) -> None:
        type_argument = getattr(field.type, "of_type", None)

        if not isinstance(field.type, StrawberryList) or not type_argument:
            raise TypeError("Invalid type for PaginationExtension")

        # check if a pagination argument is already provided - either as a resolver argument or setting `pagination=True` with `strawberry_django.type`
        pagination_argument_exists = False
        for argument in field.arguments:
            if (
                argument.python_name == "pagination"
                and getattr(argument.type, "of_type", None) == OffsetPaginationInput
            ):
                pagination_argument_exists = True
                break

        if not pagination_argument_exists:
            raise ValueError("Missing pagination argument")

        field.type = GenericListNode[type_argument]

    def resolve(
        self,
        next_: Callable[..., Any],
        source: Any,
        info: Info,
        pagination: Optional[OffsetPaginationInput] = None,
        **kwargs,
    ):
        if not pagination:
            pagination = OffsetPaginationInput(offset=0, limit=30)

        # This is really slow because the default Strawberry Django resolver attempts to fetch and evaluate the entire queryset
        # before applying pagination. Source: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/35a85e7d1b9d3cb30a91b167dbfafaaa22453e75/strawberry_django/resolvers.py#L29-L34
        results = next_(source, info, **kwargs)

        paginated_results = strawberry_django.pagination.apply(pagination, results)

        return GenericListNode(
            page_meta=PageMetaNode(
                total_count=results.count(),
            ),
            results=paginated_results,
        )

# Usage
@strawberry.type(name="Query")
class FooQuery:
    foos: list[Foo] = strawberry_django.field(
        extensions=[PaginationExtension()]
    )

@bellini666 would it be possible to prevent or delay the queryset from being evaluated by next_ in a field extension? I'd also love to get your thoughts on my approach in general.

fireteam99 avatar Feb 06 '24 07:02 fireteam99

@fireteam99

would it be possible to prevent or delay the queryset from being evaluated by next_ in a field extension?

Yeah, that's the optimizer doing its thing =P

The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250

I would say that a very "ugly" workaround would be for you to do field.is_connection = True on your apply method. That might make it return the queryset without fetching it, I'm just not sure if it is going to have any other unwanted side effects...

A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection as I mentioned, but can also be overriden by an extension? What do you think?

I'd also love to get your thoughts on my approach in general.

Your approach seems fine :)

bellini666 avatar Feb 07 '24 22:02 bellini666

@bellini666 thanks for the explanation!

The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250

Ahh i see. Can confirm setting field.is_connection = True does indeed fix the performance issues! I didn't notice any issues with ordering, filter, prefetching, or any unwanted side affects for that matter. From what I understand, the qs._fetch_all() is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.

A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection as I mentioned, but can also be overriden by an extension? What do you think?

Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?

fireteam99 avatar Feb 15 '24 22:02 fireteam99

From what I understand, the qs._fetch_all() is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.

Actually the opposite. qs._fetch_all() is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.

That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async), otherwise you get something like this:

Screenshot 2024-02-16 at 11 00 38

Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

bellini666 avatar Feb 16 '24 14:02 bellini666

Actually the opposite. qs._fetch_all() is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.

That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async), otherwise you get something like this:

Oh interesting! Does that mean disabling qs._fetch_all() prevents the query from running async?

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

Awesome, thank you so much!

fireteam99 avatar Feb 17 '24 06:02 fireteam99

Oh interesting! Does that mean disabling qs._fetch_all() prevents the query from running async?

It prevents you from returning querysets to be resolved at graphql-core level, as iterating over them without them already being cached will trigger a db query T_T

bellini666 avatar Feb 17 '24 12:02 bellini666

Hey @fireteam99 , I think this should allow you to solve your issue: https://github.com/strawberry-graphql/strawberry-django/pull/481

When it is merged, you just need to do field.disable_fetch_list_results = True in your extension apply method

Let me know what you think.

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)

bellini666 avatar Feb 17 '24 13:02 bellini666

Hi @bellini666, thanks for adding this feature! This is a lot cleaner than the connection workaround.

Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)

Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?

fireteam99 avatar Feb 20 '24 02:02 fireteam99

Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?

No. Although strawberry-django will do the fetching for both, unless you set the disable_fetch_list_results attribute to True

bellini666 avatar Feb 20 '24 13:02 bellini666

I see, thanks for the explanation! Glad there will be a more permanent fix coming in the future

fireteam99 avatar Feb 21 '24 08:02 fireteam99