strawberry
strawberry copied to clipboard
Allow access to StrawberryField.metadata in a query level.
- [V ] Alteration (enhancement/optimization) of existing feature(s)
Description
I am developing a product with strawberry_graphql and strawberry_graphql_django. While developing, I need a pagination function, so I am using the pagination function provided by strawberry_graphql_django, I want to add a function that returns the total number of items together.
I think it's easy to do with the following modification by using metadata defined in strawberry graphql, The problem is that there is no way to make the values defined in self.metadata visible in the query.
Is there any way to make the metadata visible in the query?
def apply_pagination(
self,
queryset: _QS,
pagination: Optional[object] = None,
) -> _QS:
self.metadata = { "total_count": queryset.count() }
return apply(pagination, queryset)
Upvote & Fund
- We're using Polar.sh so you can upvote and help fund this issue.
- We receive the funding once the issue is completed & confirmed by you.
- Thank you in advance for helping prioritize & fund our backlog.
@bellini666 how do you handle this use case in django?
Another idea is to add a boolean flag to the parameters of strawberry_django.fields() that indicates a counter query.
For example, one can define a counter query like the following:
@strawberry.type
class Query:
fruit_count: int = strawberry_django.field(count=True)
And in a filter apply resolver,
def apply(
filters: Optional[object],
queryset: _QS,
info: Optional[Info] = None,
pk: Optional[Any] = None,
count: Bool = False,
) -> _QS:
if pk not in (None, strawberry.UNSET): # noqa: PLR6201
queryset = queryset.filter(pk=pk)
if filters in (None, strawberry.UNSET) or not has_django_definition(filters): # noqa: PLR6201
return queryset.count() if count else queryset
...
return queryset.count() if count else queryset
Then a user can make counter function easily with the default field resolver.
I'm not familiar with the structure of the library but I think you can do easily with the above idea.
Is there any way to make the metadata visible in the query?
@shmoon-kr what exactly do you mean by this? I'm not sure if I got it from your example.
@bellini666
Please refer to the following link: https://strawberry.rocks/docs/guides/pagination/overview
In the documentation above, there is an example of using metadata to return a total count when implementing the pagination function.
According to the above article, "It may be useful to add metadata to the result. For example, the metadata may specify how many items there are in total, so that the client knows what the greatest offset value can be." and gives the following query result example.
{
"users": [
...
]
"metadata": {
"count": 25
}
}
This is exactly what I need, but I don't have an example of how to add metadata to the results in a query resolver and how to include it in the resolver.
I checked the source code and saw that the StrawberryField class has a member variable called metadata, so I assumed I could assign it to this variable, and since the StrawberryDjangoField class inherits from the StrawberryField class, I thought I could do the same.
As a test, I tried to modify the pagination apply resolver defined in strawberry_django to include the metadata, but that didn't make the metadata appear in the query result like above. So, I'm asking how to get the metadata to appear in the query results, like the results of my query.
Thank you for your interest in this case.
--
@shmoon-kr oh, I see what you mean
You can probably do something similar to what I do on strawberry-graphql-django for ListConnectionWithTotalCound
: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/main/strawberry_django/relay.py#L51
In there I overridden resolve_connection
to store nodes
at the object, which is basically the django's QuerySet
, and then I use it in the total_count
resolver, which could instead return a Metadata
object.
Of course you would need to change your return value and probably implement your own resolver for that.
Let me know if that points you to the right direction. Otherwise feel free to share more code in here so I can give you more advices
Hi all, I'm also using strawberry-django and have a similar need for pagination metadata without using Relay. My current approach is a manually resolve my list queries using a generic "wrapper" node.
@strawberry.type
class GenericListNode(Generic[T]):
page_meta: PageMetaNode
results: List[T]
def get_generic_list(
root,
info: Info,
model: Type[Model],
node: Type[T],
pagination: Optional[OffsetPaginationInput] = strawberry.UNSET,
filters = None,
order = None,
) -> GenericListNode[T]:
"""
Provides pagination metadata with support for Strawberry Django's built in pagination, sorting, and ordering.
"""
# for some reason trying to set pagination as a default argument does not work
if not pagination:
pagination = OffsetPaginationInput(offset=0, limit=30)
qs: List[T] = model.objects.all()
get_queryset = getattr(node, "get_queryset", None)
if callable(get_queryset):
qs = node.get_queryset(qs, info)
qs: List[T] = strawberry_django.ordering.apply(order, qs)
qs: List[T] = strawberry_django.filters.apply(filters, qs)
total_count = qs.count()
qs: List[T] = strawberry_django.pagination.apply(pagination, qs)
return GenericListNode(
page_meta=PageMetaNode(
total_count=total_count,
),
results=qs,
)
# Usage
def get_foos(
root,
info: Info,
pagination: Optional[OffsetPaginationInput] = None,
filters: Optional[FooFilter] = None,
order: Optional[FooOrder] = None,
) -> GenericListNode[UserNode]:
return get_generic_list(root, info, FooModel, Foo, pagination, filter, order)
@strawberry.type(name="Query")
class FooQuery:
foos: GenericListNode[Foo] = strawberry.field(resolver=get_foos)
This approach isn't perfect, since you still have to write a resolver function for each query and can't take advantage of strawberry-django's automatic list resolver.
Therefore, I've been trying encapsulate this logic into a field extension. Unfortunatly, this results in a performance issues because the next_
function in resolve
evaluates the entire queryset (i think due to this line). This makes queries on large tables very slow - even when applying aggressive pagination.
Extension:
class PaginationExtension(FieldExtension):
def apply(self, field: StrawberryField) -> None:
type_argument = getattr(field.type, "of_type", None)
if not isinstance(field.type, StrawberryList) or not type_argument:
raise TypeError("Invalid type for PaginationExtension")
# check if a pagination argument is already provided - either as a resolver argument or setting `pagination=True` with `strawberry_django.type`
pagination_argument_exists = False
for argument in field.arguments:
if (
argument.python_name == "pagination"
and getattr(argument.type, "of_type", None) == OffsetPaginationInput
):
pagination_argument_exists = True
break
if not pagination_argument_exists:
raise ValueError("Missing pagination argument")
field.type = GenericListNode[type_argument]
def resolve(
self,
next_: Callable[..., Any],
source: Any,
info: Info,
pagination: Optional[OffsetPaginationInput] = None,
**kwargs,
):
if not pagination:
pagination = OffsetPaginationInput(offset=0, limit=30)
# This is really slow because the default Strawberry Django resolver attempts to fetch and evaluate the entire queryset
# before applying pagination. Source: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/35a85e7d1b9d3cb30a91b167dbfafaaa22453e75/strawberry_django/resolvers.py#L29-L34
results = next_(source, info, **kwargs)
paginated_results = strawberry_django.pagination.apply(pagination, results)
return GenericListNode(
page_meta=PageMetaNode(
total_count=results.count(),
),
results=paginated_results,
)
# Usage
@strawberry.type(name="Query")
class FooQuery:
foos: list[Foo] = strawberry_django.field(
extensions=[PaginationExtension()]
)
@bellini666 would it be possible to prevent or delay the queryset from being evaluated by next_
in a field extension? I'd also love to get your thoughts on my approach in general.
@fireteam99
would it be possible to prevent or delay the queryset from being evaluated by next_ in a field extension?
Yeah, that's the optimizer doing its thing =P
The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250
I would say that a very "ugly" workaround would be for you to do field.is_connection = True
on your apply
method. That might make it return the queryset without fetching it, I'm just not sure if it is going to have any other unwanted side effects...
A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection
as I mentioned, but can also be overriden by an extension? What do you think?
I'd also love to get your thoughts on my approach in general.
Your approach seems fine :)
@bellini666 thanks for the explanation!
The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250
Ahh i see. Can confirm setting field.is_connection = True
does indeed fix the performance issues! I didn't notice any issues with ordering, filter, prefetching, or any unwanted side affects for that matter. From what I understand, the qs._fetch_all()
is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.
A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection as I mentioned, but can also be overriden by an extension? What do you think?
Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?
From what I understand, the qs._fetch_all() is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.
Actually the opposite. qs._fetch_all()
is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.
That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async
), otherwise you get something like this:
Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?
There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all()
call and solve that in another way. I have an idea that might work...
Actually the opposite. qs._fetch_all() is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.
That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async), otherwise you get something like this:
Oh interesting! Does that mean disabling qs._fetch_all()
prevents the query from running async?
There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...
Awesome, thank you so much!
Oh interesting! Does that mean disabling qs._fetch_all() prevents the query from running async?
It prevents you from returning querysets to be resolved at graphql-core level, as iterating over them without them already being cached will trigger a db query T_T
Hey @fireteam99 , I think this should allow you to solve your issue: https://github.com/strawberry-graphql/strawberry-django/pull/481
When it is merged, you just need to do field.disable_fetch_list_results = True
in your extension apply
method
Let me know what you think.
There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...
Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)
Hi @bellini666, thanks for adding this feature! This is a lot cleaner than the connection workaround.
Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)
Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?
Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?
No. Although strawberry-django will do the fetching for both, unless you set the disable_fetch_list_results
attribute to True
I see, thanks for the explanation! Glad there will be a more permanent fix coming in the future