strawberry-django Strawberry and Relay potential improvements

Strawberry and Relay potential improvements

Open Mapiarz opened this issue 1 year ago • 9 comments

Hi!

I would like to bring to your attention a problem I'm facing at my company. The tl;dr is that there's no clean way to do non relay based pagination and that multiple features seem to be tightly coupled with relay. I get into details below.

Graphql seems to be very opinionated about what's the best way to do pagination, you can see this here: https://graphql.org/learn/pagination/#pagination-and-edges. That is fine if you're Facebook and you're doing infinitely scrolling pages of spam and ads. That's not what I've found myself doing in the past 14 years of my programming experience so far. At my current company we have 2 good use cases for relay based pagination: support messaging (like a chat app) and an activity stream log. That's 2 cases out of dozens where relay is NOT a good candidate. What we need is:

To know the total count of items
Have the ability to jump to any given page of items (e.g. if there are 10 pages I want to be able to go from page 1 to page 6 immediately)
Filter/order the items arbitrarily

This seemingly simple pagination approach which has been used for decades is really hard to do with strawberry graphql django. We cannot use the paginated field as we wouldn't know the total count. We can't use relay connection and ListConnectionWithTotalCount as we lose the ability to jump to a given page. We cannot use an offset based cursor as that would make it non opaque - the API client would have to know what the cursor is and how to calculate it in order to jump to a specific page. Relay spec is pretty clear about the fact that cursors should be opaque:

As a reminder that the cursors are opaque and that their format should not be relied upon, we suggest base64 encoding them.

So what does that leave us with? Well, we could implement our custom extension and pagination logic right? RIGHT?

1Z6JhBw

Too many feautres of strawberry-graphql-django are tightly coupled with relay spec. First and foremost the optimizer. If we don't inherit from the Connection class the optimizer will never kick in. Furthermore, from our testing, it seems that more functionalities are tightly coupled with the base Connection class, e.g. see references to is_connection for instance. There could be more things that fall apart without using Connection class, I just don't know.

Are there any other options that I haven't mentioned? Maybe I'm missing something obvious here? For the time being, we are doing a total hack to do this kind of pagination, see for yourself:

@strawberry.type
class PaginatedConnection(strawberry_django.relay.ListConnectionWithTotalCount[relay_types.NodeType]):
    @classmethod
    def resolve_connection(
        cls,
        nodes: relay_types.NodeIterableType[relay_types.NodeType],
        *,
        info: Optional[types.Info] = None,
        offset: int = 0,
        limit: int = 1000,
        **kwargs: Any,
    ) -> Self:
        nodes_queryset = cast(django_models.QuerySet[django_models.Model], nodes)
        sliced_nodes = nodes_queryset[offset : offset + limit]

        edges = [
            strawberry.relay.Edge[relay_types.NodeType](cursor="", node=cast(relay_types.NodeType, v))
            for v in sliced_nodes
        ]
        page_info = strawberry.relay.PageInfo(
            start_cursor=None,
            end_cursor=None,
            has_previous_page=False,
            has_next_page=False,
        )
        return cls(page_info=page_info, edges=edges, nodes=nodes)


class PaginatedConnectionExtension(strawberry_django_field.StrawberryDjangoConnectionExtension):
    connection_type: Type[PaginatedConnection[relay_types.Node]]

    def apply(self, field: strawberry_django_field.StrawberryDjangoField) -> None:
        super().apply(field)
        # Drop relay arguments
        field.arguments = [a for a in field.arguments if a.python_name not in ["after", "before", "first", "last"]]
        # Add offset/limit arguments
        field.arguments = [
            *field.arguments,
            arguments.StrawberryArgument(
                python_name="offset",
                graphql_name=None,
                type_annotation=annotation.StrawberryAnnotation(int),
                description=("Number of rows that are omitted before the beginning of the result set"),
                default=0,
            ),
            arguments.StrawberryArgument(
                python_name="limit",
                graphql_name=None,
                type_annotation=annotation.StrawberryAnnotation(int),
                description=(
                    "Limits the number of values returned from the result set. Negative values imply no limit."
                ),
                default=1000,
            ),
        ]

    def resolve(
        self,
        next_: field_extension.SyncExtensionResolver,
        source: Any,
        info: types.Info,
        *,
        limit: int,
        offset: int,
        **kwargs: Any,
    ) -> Any:
        # Copy pasted from base class with adjusted signature for limit/offset
        assert self.connection_type is not None
        nodes = cast(Iterable[relay_types.Node], next_(source, info, **kwargs))

        # We have a single resolver for both sync and async, so we need to check if
        # nodes is awaitable or not and resolve it accordingly
        if inspect.isawaitable(nodes):

            async def async_resolver():
                resolved = self.connection_type.resolve_connection(
                    await nodes,
                    info=info,
                    limit=limit,
                    offset=offset,
                )
                if inspect.isawaitable(resolved):
                    resolved = await resolved

                return resolved

            return async_resolver()

        return self.connection_type.resolve_connection(
            nodes,
            info=info,
            limit=limit,
            offset=offset,
        )

To sum up, how could this pagination mechanism be best achieved with strawberry django? Looking forward, what kind of improvements could we do not to overly rely on the Relay spec? Probably the hardest thing to get right (and also the most important) would be the optimizer.

I hope we can have a fruitful discussion!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Jul 14 '23 07:07 Mapiarz

Hi @Mapiarz ,

I get what you mean. Indeed the current implementation, and specially the optimizer, can either be used with the default limit/offset pagination, which is pretty basic and doesn't provide a way to have a totalCount attribute for jumping in a specific page, or to do a cursor based pagination, in which we are supporting the relay spec by default in strawberry.

But having said all of that, I would say that you can actually use ListConnectionWithTotalCount for pagination "with pages". In my company we actually do a lot of those using it. Let me explain what we do:

To know the total count of items

The totalCount together with the client knowing how many rows/page it wants is enough to know the number of pages you have.

Have the ability to jump to any given page of items (e.g. if there are 10 pages I want to be able to go from page 1 to page 6 immediately)

The idea of the cursor being something "opaque" is to ensure that the server itself can use any info in there that will allow it to do the pagination, but that doesn't mean that you cannot use indexes in those to do a limit/offset approach!

In fact, that is actually what ListConnection does! If you look at a cursor in the pagination, it will be a string like YXJyYXljb25uZWN0aW9uOjEyMw==, which if you base64decode it give you arrayconnection:123. ListConnection translates that to "the result at index 123 in the list" or results[123].

Some examples on how you can translate that to "pages". Consider an example where you have a queryset that if not limited would return 1050 items and you want 100 items/page:

By retrieving the totalCount you know that you have 11 pages, by doing math.ceil(1050/100)
To jump to an specific page, you need to pass the limit/offset to it. In this example, to go to page 3 for example you want an offset of 300 and a limit of 100, meaning you can call the connection passing first: 100 and after: base64encode("arrayconnection:299")
You can even tell which page you are if you don't have that info by looking at pageInfo.startCursor. You decode64 it, split by : and take the second element as the index. If you divide the start cursor's index by the number of pages (100 in this example), you will have the page. For example, if the startCursor is 300, then 300 / 100 = 3

Filter/order the items arbitrarily

That is already how it works :)

You can pass filters and order to strawberry_django.connection() and it will add filtering/ordering to the query, allowing the client to filter/order the results for you and paginate on top of that. The totalCount will be the COUNT considering the filters, and the pagination will take both the filters and ordering in consideration

Jul 14 '23 13:07 bellini666

We don't use React as our front-end, but using the relay implementation of this library gets us as close to our goal so far. At our company, I've been trying to put together something that utilizes many features of strawberry-graphql-django as it stands, and keep hitting gotchas.

Documentation isn't in line with reality. I'm spending more time reading code and at this point, adding logging at various places to better understand what it's doing, only to be left with more questions about what strawberry-graphql-django should or shouldn't do in whichever circumstance.
Using relay with filters and ordering is mostly working. I was even able to get nested relay resolvers to work but only if I use the following syntax:

def checker(info: Info, user: UserType):
    def perm_checker(perm: PermDefinition, obj: Any) -> bool:
        # For our needs, we need AND logic not OR logic
        return user.has_perm(perm.perm) and user.has_perm(perm.perm, obj)

    return perm_checker

@strawberry.django.type(models.Budget, filters=BudgetFilter, order=BudgetOrder)
class Budget(relay.Node):
    id: auto # <-- MAP_AUTO_ID_AS_GLOBAL_ID = True has no effect. This will always throw an error.
    id: relay.NodeID[uuid.UUID] # <- this works, but the front-end has to base64 decode and parse out model's PK to use with BudgetFilter
    id: relay.GlobalID # <-- this is the model's PK and works for us.
    ...
    @strawberry.django.connection(
        ListConnectionWithTotalCount["Income"],
        prefetch_related=["incomes", "group_permissions", "user_permissions"], # <-- have to hint the optimizer
        extensions=[HasRetvalPerm("budget.view_income", obj_perm_checker=checker)],
    )
    def incomes(self, root: models.Income) -> list["Income"]:
        return root.incomes.all()

As you can see this deviates from docs because docs say return type of the resolver is a QuerySet[SomeModel] which throws an error. On top of this, trying to convert this to an async resolver was unsuccessful because the Has*Perm extensions don't appear to be async friendly, but I'll get to that in a moment.

Strangly, if I re-write the field to be:

incomes: ListConnectionWithTotalCount["Income"] = strawberry.django.connection(prefetch_related=[...], extensions=[...]) # same prefetch and extensions as above

the optimizer hint is ignored in this case, whereas it works fine if using the custom resolver.

Permissions. Docs mention HasObjPerm which never appeared to enforce object permissions. In the current version HasObjPerm no longer exists and I think this is because it has been replaced by HasRetvalPerm? The default object permissions checker does: user.has_perm(perm.perm) or user.has_perm(perm.perm, obj) which checks if a user has a global permission or object permission. This means when using guardian, budget.view_income object permission is only enforced if a user does not have budget.view_income global permission. In our case, we need AND logic for this (which seems to be the generally accepted behavior of guardian when used with Django REST Framework). This is why I'm using my own checker in the code above (it simply changes the logic from OR to AND) and it works but ONLY in nested relay connections. Here's why:

# strawberry_django/permissions.py:795ish
    def _resolve_obj(
        self,
        source: Any,
        user: UserType,
        obj: Any,
        *,
        info: Info,
    ) -> Any:
        context = perm_context.get()
        if context.is_safe: # <-- this will always be True if you're at the root of a Schema???
            return obj

        # And thus the custom checker passed as obj_perm_checker to the extension is never reached

IsAuthenticated when using guardian in default configuration has no effect because the AnonymousUser is always authenticated and NOT is_anonymous - See: https://django-guardian.readthedocs.io/en/stable/configuration.html - Again this is a situation of "what is the intention of strawberry-django vs. what the docs say" leaving the user to either implement their own IsAuthenticated or configure guardian with ANONYMOUS_USER_NAME = None

Jul 16 '23 20:07 salcedo

Thanks @bellini666.

The idea of the cursor being something "opaque" is to ensure that the server itself can use any info in there that will allow it to do the pagination, but that doesn't mean that you cannot use indexes in those to do a limit/offset approach!

I have to disagree, that's not the reason why the cursor has to be opaque. In short, it's so that clients can be decoupled from implementation details of the backend. Per relay spec, cursors should be basically magic strings that the clients use to get to the next/prev page. The approach you described is in direct violation of the spec, as the clients have to understand what the cursor is and have to effectively duplicate logic that the backend API implementation has. If that's the case, why even base64 encode it?

Having said that, this approach is a workaround that we will probably end up using. But we can do that only because our API is private and we control the clients. This would never work in an environment where the clients are not controlled by the API implementors. It violates the Relay spec and such an approach should not be recommended to the users.

Jul 17 '23 08:07 Mapiarz

@Mapiarz

I have to disagree, that's not the reason why the cursor has to be opaque. In short, it's so that clients can be decoupled from implementation details of the backend. Per relay spec, cursors should be basically magic strings that the clients use to get to the next/prev page. The approach you described is in direct violation of the spec, as the clients have to understand what the cursor is and have to effectively duplicate logic that the backend API implementation has. If that's the case, why even base64 encode it?

Exactly. "cursors should be basically magic strings that the clients use to get to the next/prev page" means that the client doesn't need to know what is in there for pagination, but the server itself knows.

Indeed the approach I described is a violation of the spec, but as long as you are using ListConnection or a subclass of it, it is a implementation detail that allows you to make that work.

And for curiosity, the official graphql-relay-js does the same for its ArrayConnection, which is a limit/offset approach and cursors are converted to a base64 version of arrayconnection:1234 (I took the inspiration from there for ListConnection)

Having said that, this approach is a workaround that we will probably end up using. But we can do that only because our API is private and we control the clients. This would never work in an environment where the clients are not controlled by the API implementors. It violates the Relay spec and such an approach should not be recommended to the users.

Yes, but it all depends on what you want to expose to end users and their needs.

If your users need a way to "jump to pages" and you don't want to document such a workaround to them, then you should probably not be exposing that using a relay connection but something else instead.

But if everything they need is just a way to paginate through stuff, and even know the totalCount for the results, then it should be enough. For example, the github's graphql api implements the relay spec and uses connections for pagination.

Having said all of that, the thing here is that the relay spec is not made with the assumption that you can "jump to pages", although some implementations, such as ListConnection's one, allows that. For a pure limit/offset approach we probably need something closer to the default pagination that we have, but return it in an object which has totalCount and the results in a list.

Jul 17 '23 14:07 bellini666

And for curiosity, the official graphql-relay-js does the same for its ArrayConnection, which is a limit/offset approach and cursors are converted to a base64 version of arrayconnection:1234

I'm disappointed but not surprised. There's no excuse for this. I might raise that issue there later.

If your users need a way to "jump to pages" and you don't want to document such a workaround to them, then you should probably not be exposing that using a relay connection but something else instead.

I agree.

Having said all of that, the thing here is that the relay spec is not made with the assumption that you can "jump to pages".

Exactly. That's why tightly coupling optimizer and more with the relay spec is a problem in this library.

For a pure limit/offset approach we probably need something closer to the default pagination that we have, but return it in an object which has totalCount and the results in a list.

That would be the right solution for this problem if this pagination mechanism works with the optimizer, filtering and ordering. Does it? Do you have a rough idea what it would take to add a total count to this pagination mechanism?

Jul 17 '23 14:07 Mapiarz

@salcedo ,

Documentation isn't in line with reality. I'm spending more time reading code and at this point, adding logging at various places to better understand what it's doing, only to be left with more questions about what strawberry-graphql-django should or shouldn't do in whichever circumstance.

Yeah, I'm sorry about that. The documentation indeed needs some love!

If you have any problems with it, feel free to open issues to report those, or even to open PRs fixing them.

Using relay with filters and ordering is mostly working. I was even able to get nested relay resolvers to work but only if I use the following syntax:

Regarding your example here, some thing to note:

Since you are inheriting from relay.Node, you don't actually need to define an id as it will get automatically generated for you as id: GlobalID
The place you mentioned that the frontend has to decode the base64 to pass the pk to the filter, you probably want the id to be received as a GlobalID in the filter as well. The MAP_AUTO_ID_AS_GLOBAL_ID was created for that reason and should affect the filter itself when doing id: auto, but you can also define it as id: GlobalID in the filters if not setting that option

As you can see this deviates from docs because docs say return type of the resolver is a QuerySet[SomeModel] which throws an error. On top of this, trying to convert this to an async resolver was unsuccessful because the Has*Perm extensions don't appear to be async friendly, but I'll get to that in a moment.

The connection implementation expects an iterator/iterable/generator to be returned so it can paginate on top of it, which a QuerySet is and is what you want to return for django. If that is raising an error for you, could you share more details of it? Maybe open a new issue for that.

Also, although you don't actually need to use async resolvers to return a queryset, the Has*Perm should be async friendly. Do you have an example where it did not work properly so that I can take a look?

the optimizer hint is ignored in this case, whereas it works fine if using the custom resolver.

Nested connections have a lot of corner cases which are not yet properly handled, specially at the optimizer level. For example the optimizer will ignored nested connections when finding one because, if it does prefetch_related it, those results will be thrown away later which can make the performance worse. e.g. suppose you have a list of 100 objs, and each one has a connection that leads to 1m more. If the optimizer does his work here, it will prefetch 100m objs to then paginate and retrieve 100 objects for each, meaning that n+1 here is actually more performatic.

Take a look at this comment for a more detailed explanation of this.

Docs mention HasObjPerm which never appeared to enforce object permissions. In the current version HasObjPerm no longer exists and I think this is because it has been replaced by HasRetvalPerm?

Indeed. It was called HasObjPerm on strawberry-django-plus but I renamed it to HasRetvalPerm in here to make it more explicit that we are checking permissions for the returned value in that resolver.

The default object permissions checker does: user.has_perm(perm.perm) or user.has_perm(perm.perm, obj) which checks if a user has a global permission or object permission

Indeed. Maybe we should have an accept global like guardian does? Could you open an issue for this?

and it works but ONLY in nested relay connections

Yes, that's because when using guardian we are filtering the results based on permissions, meaning we don't need to waste resources checking each object again later.

The issue here is the same as above, the lack of an accept_global flag. Having that would solve both issues.

IsAuthenticated when using guardian in default configuration has no effect because the AnonymousUser is always authenticated and NOT is_anonymous - See: https://django-guardian.readthedocs.io/en/stable/configuration.html - Again this is a situation of "what is the intention of strawberry-django vs. what the docs say" leaving the user to either implement their own IsAuthenticated or configure guardian with ANONYMOUS_USER_NAME = None

In this case I actually don't think that it is an issue. In fact the user "is authenticated" based on the fact that there's a default anonymous users, which can even be used to check permissions for.

Having said that, we can probably improve the docs here to mention that, when using guardian and using the anonymous users, that one will be considered as authenticated in this case. Or even add have a setting for consider a guardian's anonymous user as "not authenticated"?

obs. As I mentioned in my reply, be sure to open issues and some reproduction examples for the problems you are finding! I'll take a look in each of them and try to fix ASAP :)

Jul 17 '23 14:07 bellini666

And for curiosity, the official graphql-relay-js does the same for its ArrayConnection, which is a limit/offset approach and cursors are converted to a base64 version of arrayconnection:1234

I'm disappointed but not surprised. There's no excuse for this. I might raise that issue there later.

I just checked the lib out and it doesn't suggest anywhere that using non opaque cursor is the way to achieve 'jump to page'. So nothing to complain there. It's ok to use offset/limit as your cursor implementation, that alone is fine. What's not fine is to ask your clients to interpret your cursor ;)

Jul 17 '23 14:07 Mapiarz

I just checked the lib out and it doesn't suggest anywhere that using non opaque cursor is the way to achieve 'jump to page'. So nothing to complain there. It's ok to use offset/limit as your cursor implementation, that alone is fine. What's not fine is to ask your clients to interpret your cursor ;)

Indeed! That's our secret 🤫 hahaha

But I mentioned that because we actually do that a lot internally in our projects.

That would be the right solution for this problem if this pagination mechanism works with the optimizer, filtering and ordering. Does it? Do you have a rough idea what it would take to add a total count to this pagination mechanism?

Maybe something similar to the relay implementation, but simpler?

Like:

@strawberry.type
class Paginated(Generic[T]):
    totalCount: int
    results: list[T]

Then anytime we see a fruit: Paginated[Fruit] we inject the pagination to the field, the same way pagination=True does, and resolve it properly.

We can even have a strawberry_django.paginated, which acts similar to strawberry_django.connection, that can inject a default resolver if one is not defined, and can decorate a resolver that returns a queryset and handle it properly

Jul 17 '23 14:07 bellini666

@Mapiarz @bellini666 I didn't mean to jump in the middle of this issue with all the XY. I was frustrated by having limited time to solve some difficult issues for work. I am but one of two developers at our company and we have business needs that must come first.

While y'all get these things sorted out, we are shifting away from strawberry and making some improvements to our existing DRF to address the concerns we have, and I will open separate issues when time permits. I would love to PR documentation as well, once we get over some big hurdles here and my time is freed.

tl;dr - My bad. Seeing myself out.

Jul 17 '23 15:07 salcedo

strawberry-django strawberry-django copied to clipboard

Strawberry and Relay potential improvements

Upvote & Fund

strawberry-django
strawberry-django copied to clipboard